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?£! . abstract  (continued) 

register  structure,  dpta  types  an,d  operators,  control  operators' and  address  calcu- 
lation. These  may  be  evaluated  in  terms  of  four  types  of  costs:  execution  tine, 

memory  space,  cost  of  programming,  and  the  cost  of  hardware.  The  methods  preconted 
are  mostly  concerned  with  time. 

A set  of  programs,  the  subject  set,  was  used  to  represent  the  ISP  workload.  This 
was  chosen  primarily  to  investigate  the  variations  in  the  results  caused  by  var- 
iation of  language,  language  implementation,  algorithm,  and  programmer. 

Register  structure  is  investigated  through  the  concept  of  a register  life.  This 
is  the  period  from  when  a register  is  loaded,  until  its  last  use  before  the  next 
time  it  is  loaded.  The  methods  provide  data  relevant  to  two  problems: 
a)  What  is  the  optimal  number  of  registers?  b)  How  desirable  is  generality  of 
registers? 

An  algorithm  is  presented  which  will  find  how  many  registers  are  live  at  eacli  time 
during  the  program  execution.  This  algorithm  is  extended  to  compute  an  upper 
bound  on  the  increase  in  time  if  the  program  were  to  run  on  an  ISP  with  fewer 
registers.  This  computation  is  based  on  temporarily  storing  registers  that  are 
live  but  unused  for  long  periods,  and  on  interleaving  several  lives  in  one  register. 
The  thesis  also  presents  a classification  of  the  operations  that  may  be  performed 
on  a register.  This  induces  a classification  of  register  lives  which  may  be  used 
to  assess  the  need  for  generality. 

Most  of  the  other  methods  presented  apply  equally  to  data  operators,  control 
operators,  and  addressing.  The  main  problems  are: 

a)  How  to  detect  operators  that  are  in  the  ISP,  but  not  used  sufficiently  to  just- 
ify them.  This  is  done  by  frequency  counts  and  various  derivatives  therof.  Par- 
ticularly interesting  are  the  frequency  results  obtained  by  weighted  summation  over 
the  whole  subject  set.  b)  How  to  detect  operators  that  should  be  included  in  the  ISP. 
Theis  problem  is  approached  by  studying  instruction  sequences. 

The  main  problem  in  detecting  sequences  is  to  reduce  the  space  and  time  require- 
ments of  the  analysis  program.  This  problem  was  solved  by  using  a multi  pass  al- 
gorithm. Rach  pass  extends  the  existing  sequences  by  one  insLi.uci.iuii.  After  each 
iir.ui. iatie  me thous  are  used  to  discard  insignificant  sequences. 

The  thesis  proposes  methods  to  study  operand  values,  information  used  for  control 
and  addressing,  information  related  to  the  addressing  problem  for  tests,  and  infor- 
mation on  use  of  indirection. 

The  most  inportant  conclusions  drawn  about  the  validity  of  the  methods  are:  The 

experimental  results  show  good  i:<ternal  consistency.  Their  trend  is  independent 
of  algorithm  and  programming  language.  They  agree  well  with  previous  knowledge. 

The  dependence  on  language  is  most  important  for  those  languages  that  use  a run 
time  systeqt.  The  use  of  data  operators  and  data  structures  depend  on  algorithm, 
the  register  usage  does  not. 

In  a subject  set  for  a full  scale  analysis,  the  data  operators  and  data  structures 
of  the  area  of  applications  should  be.  well  represented.  The  individual  subject 
programs  should  be  large  enough  that  dominating  loops  are  avoided. 
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ABSTRACT 


The  thesis  develops  and  evaluates  methods  for  evaluation  of  the  architecture  of  instruction 
set  processors  (ISPs).  (An  ISP  is  the  logical  processor  defined  by  the  instruction  set, 
independent  of  physical  implementation).  The  methods  are  based  on  analyzing  traces  of 
program  executions  which  contain  information  about  every  instruction  executed. 

The  main  advantages  of  the  methods  are: 

a)  They  permit  a very  detailed  study  of  ISP  behaviour. 

b)  They  are  not  restricted  to  specific  languages  or  processors. 

c)  They  are  easily  programmed. 

Methods  and  experimental  results  are  presented  for  four  aspects  of  ISP  architecture: 
register  structure,  data  types  and  operators,  control  operators  and  address  calculation. 
These  may  be  evaluated  in  terms  of  four  types  of  costs:  execution  time,  memory  space,  cost 
of  programming,  and  the  cost  of  hardware.  The  methods  presented  are  mostly  concerned 
with  time. 

A set  of  programs,  the  subject  set,  was  used  to  represent  the  ISP  workload.  This  was 
chosen  primarily  to  investigate  the  variations  in  the  results  caused  by  variation  of  language, 
language  implementation,  algorithm,  and  programmer. 

Register  structure  is  investigated  through  the  concept  of  a register  life.  This  is  the  period 
from  when  a register  is  loaded,  until  its  last  use  before  the  next  time  it  is  loaded.  The 
methods  provide  data  relevant  to  two  problems: 

a)  What  is  the  optimal  number  of  registers? 

b)  How  desirable  is  generality  of  registers? 

An  algorithm  is  presented  which  will  find  how  many  registers  are  live  at  each  time  during  the 
program  execution.  This  algorithm  is  extended  to  compute  an  upper  bound  on  the  increase  in 
time  if  the  program  were  to  run  on  an  ISP  with  fewer  registers.  This  computation  is  based 
on  temporarily  storing  registers  that  are  live  but  unused  for  long  periods,  and  on 
interleaving  several  lives  in  one  register, 
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The  thesis  also  presents  a classification  of  the  operations  that  may  be  performed  on  a 
register.  This  induces  a classification  of  register  lives  which  may  be  used  to  assess  the  need 

for  generality. 

Most  of  the  other  methods  presented  apply  equally  to  data  operators,  control  operators,  and 
addressing.  The  main  problems  are: 

a)  How  to  detect  operators  that  are  in  the  ISP,  but  not  used  sufficiently  to  justify  them. 
This  is  done  by  frequency  counts  and  various  derivatives  thereof.  Particularly 
interesting  are  the  frequency  results  obtained  by  weighted  summation  over  the  whole 

subject  set. 

b)  How  to  detect  operators  that  should  be  included  in  the  ISP.  This  problem  is 
approached  by  studying  instruction  sequences. 

The  main  problem  in  detecting  sequences  is  to  reduce  the  space  and  time  requirements  of 
the  analysis  program.  This  problem  was  solved  by  using  a multi  pass  algorithm.  Each  pass 
extends  the  existing  sequences  by  one  instruction.  After  each  pass,  heuristic  methods  are 
used  to  discard  insignificant  sequences. 

The  thesis  proposes  methods  to  study  operand  values,  information  used  for  control  and 
addressing,  information  related  to  the  addressing  problem  for  tests,  and  information  on  use  of 

indirection. 

The  most  important  contusions  drawn  about  the  validity  of  the  methods  are:  The  experimental 
results  show  good  internal  consistency.  Their  trend  is  independent  of  algorithm  and 
programming  language.  They  agree  well  with  previous  knowledge.  The  dependence  on 
language  is  most  important  for  those  languages  that  use  a run  time  system.  The  use  of  data 
operators  and  data  structures  depend  on  algorithm,  the  register  usage  does  not. 

In  a subject  set  for  a fnll  scale  analysis,  the  data  operators  and  data  structures  of  the  area 
of  applications  should  be  well  represented.  The  individual  subject  programs  should  be  large 
enough  that  dominating  loops  are  avoided. 
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A NOTE  ON  TERMINOLOGY 


By  an  instruction  set  processor  or  I££  we  mean  the  logical  processor  defined  by  the 
instruction  set,  as  opposed  to  its  p» ,/sical  implementation.  Included  in  the  ISP  structure  are 
such  things  as  instruction  formats,  register  structure,  instruction  interpretatio 1 algorithm 
(including  address  calculation),  datatypes  and  their  representation,  etc.  Computer  families, 
like  the  IBM  360  and  370  series  and  the  CDC  6000  series  are  examples  of  ISPs  with  several 
different  physical  implementations. 

Obviously  the  logical  structure  can  not  always  be  entirely  divorced  from  its  physical 
counterpart,  nor  is  such  a separation  always  desirable.  There  should  be  no  doubt,  in  our 
further  discussion,  when  we  take  the  physical  aspects  into  account. 


We  use  the  term  ISP  to  mean  the  instruction  set  processor  itself,  not  the  notation  for 
describing  such  processors  defined  by  Bell  and  Newell  ([BelC71]).  As  a concession  to 
readers  unfamiliar  with  it,  we  have  tried  to  avoid  using  this  notation.  The  associated 
terminology,  however,  is  used. 

Italics  are  used  for  words  that  are  previously  defined.  Underlining,  is  used  for  worus  that 
are  being  defined,  or  otherwise  stressed. 

In  the  tables  of  results,  0 means  an  exact  zero,  0.000  or  similar  constructs  mean  less  than 
1/2000  (in  this  case)  but  not  exactly  0. 

Unless  otherwise  stated,  the  term  "PDP-10"  is  used  to  mean  the  DECsystemlO  ISP  or  the 
KA10  processor  of  that  system,  both  described  in  [DEC71]. 
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CHAPTER  1 
INTRODUCTION 


Sptilet  tr  jivet  os  Menskor  av  Gud 
man  Fandan  bar  fiat  dal  den  Fail 
at  dal  aldnf  kan  vna  hvordan  man  aar  ud 
naar  man  *ka  aar  i at  Spail 

Kumbal  Kumball 

This  thesis  is  concerned  w<th  the  architecture  of  Instruction  Set  Processors.  It  identifies  the 
most  important  parameters  of  such  architectures,  their  interdependence  and  their  associated 
costs.  It  proceeds  to  present  a collection  of  methods  for  evaluating  some  of  these  costs. 
Most  of  the  effort  of  the  thesis  lies  in  developing  these  methods  and  studying  their 
performance  for  one  ISP  and  a set  of  programs  (a  subject  set)  running  on  that  ISP. 

Our  point  of  view  is  that  of  the  programmer,  or  maybn  more  correctly,  that  of  the  program 
being  executed.  The  goal  of  our  methods  is  to  evaluate  the  features  of  ISPs  in  terms  of  their 
utility  to  the  program  (or  programmer).  Thus  the  questions  that  they  will  attempt  to  answer 
can  be  generalized  tc:  "How  well  does  the  programm jr/compiler  utilize  the  features  made 
available  to  him  through  the  instruction  set?  Which  of  these  features  should  be  removed  or 
changed?  Which  should  be  added’" 

The  methods  are  based  on  analyzing  traces  of  programs  being  executed,  where  the  trace 
contains  information  about  every  instruction  executed  by  the  program.  The  analysis  is 
performed  by  separate  programs,  and  is  thus  completely  disjoint  from  the  writing  of  the 
trace.  Most  of  the  methods  presented,  and  certainly  the  most  important  ones,  have  been 
implemented  as  programs  and  used  in  experiments.  The  experimental  results  agree  well  with 
previous  Knowledge  and  with  intuition,  and  are  also  consistent  among  themselves.  Hence  the 
experimental  evidence  supports  the  validity  of  the  methods. 

The  experimental  results  that  we  present  are  from  experiments  designed  primarily  to 
evaluate  the  methods,  not  the  ISP  that  we  have  worked  on.  In  particular  the  programs  we 
have  analyzed  are  small,  and  from  a restricted  application  area.  Hence,  although  many  of  our 
results  certainly  permit  valid  conclusions  about  the  ISP  we  have  worKed  with*,  our  set  of 
subject  programs  has  been  too  restricted  to  provide  the  basis  for  a valid,  full  scale 
evaluation  of  a general  purpose  ISP. 
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♦ The  PDP-10 


INTRODUCTION  * 

1.1  Overview  of  the  thesis 

This  introductory  chapter  presents  an  overview  of  the  basic  ideas  of  the  methods.  It  then 
gives  a survey  of  related  work  and  relates  our  work  to  this. 

In  Chapter  2 we  present  the  types  of  cost  associated  with  implementing  and  using  (or  not 
using)  ISP  features,  and  discuss  their  relationship. 

Chapter  3 describes  the  major  sources  of  prrors  and  variation  that  might  influence  our 
experimental  results,  and  describes  how  we  selected  a set  of  subject  programs  to  evaluate 
these  influences. 

Chapters  4 through  7 contain  the  core  of  the  thesis.  In  those  chapters  we  analyze  the 
instruction  set  processor,  concentrating  on  those  features  for  which  we  have  developed 
methods  of  evaluation.  The  order  of  presentation  is: 

Chapter  4:  Register  structure 
Chapter  5:  Data  types  and  their  operators 
Chapter  6:  Control  operators 
Chapter  7:  Address  calculation 

Each  chapter  is  further  divided  into  sections,  each  discussing  a different  feature  or  aspect  of 
the  chapter  topic.  For  each  feature,  we  discuss  the  motivation  for  having  this  feature,  and 
the  costs  and  tradeoffs  associated  with  it.  Our  methods  for  estimating  some  of  these  costs 
are  described,  and  experimental  results  are  presented  where  applicable.  For  each  method  its 
limitations,  sources  of  errors,  and  dependencies  on  the  various  sources  of  variation,  as 
presented  in  Chapter  3,  are  discussed. 

For  our  analysis  we  rely  heavily  on.  the  multidimensional  computer  space  presented  by  Bell 
and  Newell  [BelC71],  The  dimensions  of  this  space  represent  such  things  as  intended 
application,  technology,  word  size,  etc.,  and  possess  several  levels  of  detail.  We  have  made 
this  structure  finer  or  coarser  to  suit  our  needs,  and  will  use  it  freely  below  without  further 
reference  to  its  origin. 

The  most  important  dimensions  for  classification  of  instruction  set  processors  are  (with  those 
most  highly  related  on  the  same  line): 
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Computer  (system)  function 
Processor  function 

Memory  accessing  algorithm  - primary  memory  size 
Addresses  per  instruction  - M.processor  state 
Word  size  - number  base  - data  types 
Control  structures 

As  stated  in  Section  3.1,  we  take  the  computer  and  processor  functions  to  be  given,  i.e. 
we  investigate  general  purpose  computers  with  a bias  towards  scientific  calculations.  The 
next  four  coordinates  above  each  corresponds  to  one  of  the  four  chapters  listed. 

The  last  chapter  summarizes  the  results  and  points  out  areas  for  future  research. 

The  thesis  describes  two  processes  more  or  less  in  parallel.  One  is  the  development  of  the 
methods  and  their  use  to  evaluate  ISP  architecture  in  terms  of  the  costs  discussed  in  Chapter 
2.  The  other  is  the  evaluation  of  the  methods  themselves,  in  terms  of  the  framework 
described  in  Chapter  3.  Both  processes  go  on  through  Chapters  4 to  7,  and  conclude  in 
Chapter  8. 


1.2  The  problem 

Several  approaches  may  be  used  to  improve  the  performance  of  computers.  These  are  to  a 
large  extent  orthogonal  and  are  often  combined,  as  exemplified  by  many  current  commercial 
designs. 

One  approach  is  to  use  faster  circuit  technology  for  a brute  force  increase  of  speed,  leaving 
the  ISP  architecture  unchanged.  This  approach  is  of  no  interest  to  the  present  discussion. 

Another  approach  involves  radical  changes  in  the  organization  of  the  central  processor,  in 
particular  higher  d.  gree  of  parallelism  on  the  task,  instruction  or  sub-instruction  levels.  This 
sometimes  implies  more  or  less  drastic  changes  in  the  way  programs  are  thought  about  and 
formulated,  as  exemplified  by  the  CDC  STAR  [Hol$71],  ILLIAC-IV  [BarG68],  and  C.rnmp 
[WulW72]  machines.  In  other  cases,  as  in  the  CDC  6600  design  [ThoJ64],  parallelism  is  on 
the  instruction  level,  retaining  the  classical  instruction  stream  concept  and  at  worst  requiring 
local  reformulation  of  the  algorithms.  Instruction  parallelism  is  peripherally  of  interest  to  our 
discussion,  (see  Section  2.3).  Parallelism  on  the  task  level  is  outside  the  scope  of  this 
thesis. 
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A third  approach  is  to  improve  the  architecture  of  the  Instruction  Set  Processor  (ISP),  but 
staying  within  the  classical  Von  Neumann  type  of  machine.  This  approach  is  the  background 
for  our  work  A difficulty  with  it,  but  also  a major  reason  for  it,  may  be  the  interest  vested 
in  existing  instruction  sets.  In  such  a case  the  problem  may  be  how  to  extend  it  compatibly, 
or  to  find  features  that  may  be  removed  at  a reasonable  cost.  Data  provided  by  our  methods 
may  be  used  in  solving  this  problem  and  also  to  some  extent  when  designing  new  instruction 


sets  from  scratch. 


There  is  ample  evidence  that  the  ISP  archite  ture  is  indeed  an  important  factor  in  processor 
efficiency  and  economy.  Notable  is  a study  by  J.  A.  Stewart  [SteJ.nd],  comparing  program 
sizes  and  execution  speed  o*  three  contemporary  computers*  having  approximately  the  same 
word  sizes**  and  instruction  execution  times***.  When  moving  benchmark  programs  between 
these  computers,  program  sizes  varied  by  factors  from  1.3  to  2.7  and  running  time  by  factors 
up  to  5****.  Some  of  this  variation  may  be  due  to  inferior  compilers  and  other  software. 
However,  code  sequences  for  commonly  occuring  constructs  indicate  that  the  problem  to  a 
large  extent  lies  with  the  instruction  set. 

/mother  example  is  provided  by  the  Burroughs  B1700  computer,  (see  page  15).  A 
considerable  gain  in  tpace  and  time  is  claimed  by  the  designers  of  this  computer  system, 
achieved  by  designing  instruction  sets  tailored  to  the  higher  level  language  used. 

Human  intuition  about  program  behavior  is  notoriously  bad.  This  has  been  demonstrated  by 
several  investigators.  One  example  is  given  by  Knuth  in  his  well  known  study  of  FORTRAN 
programs  [KnuD70].  The  personal  experience  of  people  who  have  observed  some  aspect  of 
their  programs’  behavior,  as  reported  in  countless  stories  of  computer  folklore,  tend  to 
corroborate  this. 


I 


The  cited  studies  clearly  demonstrate  a need  for  quantitative  methods  which  can  aid  the  ISP 
architect  in  deciding  values  for  the  design  parameters  of  his  ISP,  and  to  justify  his  decisions. 
The  data  obtained  should  be  as  independent  of  technology  as  possible,  so  that  they  will  not 
change  as  technology  progresses.  They  can  then  be  used  to  compare  the  cost  of 
implementing  a structure  using  different  technological  solutions,  or  to  compare  the  cost  and 
utility  of  different  structures  in  the  context  of  the  available  technologies. 


* The  IBM  360/44,  the  SDS  Sigma  5 and  the  POP-10. 

*♦  32  or  36  bits. 

***  For  commonly  used  instructions,  factors  ranged  from  0.7  to  1.8  compared  to  the  PDP-10. 
****  The  PDP-10  being  the  best 
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Ideally  the  behaviour  of  all  programs  executing  on  the  ISP  should  be  studied.  This  can  be 
done  only  superficially,  as  by  accounting  data  and  similar  information.  For  a detailed  study 
one  is  forced  to  restrict  oneself  to  a set  of,  hopefully  representative,  subl&Cl  BEfl&mPi- 
Given  an  application  area,  and  such  a subject  set  to  represent  it,  there  are  several  methods 
of  obtaining  data  on  program  behaviour.  They  may  be  classified  as  static  or  dynamic 
methods,  depending  on  whether  data  are  collected  before  or  during  execution. 

Static  information  can  be  collected  manually,  by  compilers,  or  by  some  program  analyzing  the 
relocatable  or  absolute  code.  Such  methods  should  be  used  to  obtain  the  space  cost  (see 
Section  2.3)  of  the  code  and  static  data  structures,  but  can  not  be  used  to  obtain 
information  pertinent  to  the  execution  behavior  of  the  subject  program.  For  this  purpose 
dynamic  data  are  needed.  Several  methods  of  obtaining  such  data  are  described  and 
compared  in  Section  1.2.1.  We  chose  to  use  traces  containing  information  on  every 
instruction  executed  by  the  program.  These  traces  are  written  on  an  appropriate  storage 
medium,  and  are  analyzed  later  by  separate  programs.  The  advantages  of  this  method  are 
that  the  exact  sequence  of  events  is  preserved,  and  that  a large  amount  of  detail  may  be 
recorded.  We  discuss  the  appropriateness  of  this  choice  in  Section  2. 

As  we  present  the  methods,  their  intended  domain  is  to  evaluate  the  features  of  ISP 
architecture.  The  particular  ISP  design  parameters  that  we  consider  include  the  number  and 
types  of  registers,  the  data  types  and  their  operators,  control  operators  and  their  associated 
data  structures,  and  address  calculation  methods.  Our  methods  fall  mainly  in  two  groups,  one 
dealing  with  register  structure,  the  other  with  data  and  control  operators. 

Register  structure  is  evaluated  through  the  concept  of  "register  lives".  We  present  a method 
to  detect  such  lives,  and  to  find  to  what  extent  registers  are  simultaneously  alive.  From  this 
we  are  able  to  find  an  upper  bound  on  the  increase  in  execution  time  which  would  follow  if 
the  number  of  physical  registers  were  reduced.  We  also  present  a method  to  assess  the 
need  for  generality  of  registers. 

i 

Our  methods  for  operators  and  data  types  are  based  on  frequency  counts  of  single  operators 
and  of  sequences  of  operators.  We  present  an  algorithm  for  counting  the  occurrences  of 
sequences  of  arbitrary  length,  including  a set  of  pruning  heuristics  designed  to  detect  which 
sequences  are  in  some  sense  significant.  Only  occurrences  of  such  sequences  are  counted; 
this  is  what  makes  our  algorithm  economically  feasible. 


We  expect  the  methods  to  provide  useful  evaluation  of  existing  designs  as  well  as  suggest 
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improvements  in  existing  designs  and  give  ideas  and  guidelines  for  new  designs.  Such  new 
designs  could  be  for  general  purpose  processors,  or  for  processors  specially  dasigned  for 
some  particular  language  or  some  special  class  of  computations.  Such  a specialized 
application  is  defined  more  by  the  selection  of  subject  programs  to  which  we  apply  our 
methods  than  by  the  methodology  as  such. 

Our  methods  can  also  be  applied  to  domains  less  related  to  ISP  design.  As  will  be  seen  they 
have  obvious  applications  in  compiler  design  and  language  design,  and  also  in  the  art  of 
tuning  programs  to  make  them  more  efficient.  In  particular  we  expect  our  method  for 
register  utilization  to  be  of  interest  to  these  domains. 


As  in  any  other  inquiry,  the  answers  to  one  set  of  questions  raise  new  questions  that  one 
would  like  to  answer.  In  some  cases  our  methods  will  produce  compact  data  bases  which  will 
allow  certain  kinds  of  simple  questions  to  be  answered  after  the  original  analysis,  and  at  a 
much  lower  cost. 


1.2.1  Obtaining  dynamic  information 

Dynamic  information  can  be  collected  by  hardware  monitors,  by  programs  running  in  parallel 
with  the  subject  program*,  by  code  inserted  into  the  subject  program  by  the  compiler,  or  as 
in  our  case,  by  running  the  subject  program  on  an  interpreter  for  the  ISP  in  question.  In  any 
case,  the  data  can  be  analyzed  on  the  fly  or  saved  for  later  analysis  by  special  programs. 

Programs  or  hardware  monitors  may  be  used  to  sample  the  program  counter  and  other 
pertinent  parts  of  the  processor  state.  This  can  give  us  information  about  the  (relative) 
frequencies  of  various  events,  such  as  the  execution  frequency  of  the  different  parts  of  the 
program.  Considerable  analysis  of  the  subject  program  is  required  to  obtain  information 
about  its  local  behavior.  Information  about  the  sequence  of  events,  such  as  the  behavior 
across  programmed  jumps,  can  not  be  reconstructed  completely.  Also  no  information  about 
register  content  and  operand  values  is  available.  Furthermore,  in  the  case  of  sampling  by 
program,  the  results  are  not  exact,  but  depend  on  sampling  rate  and  random  events. 

Code  inserted  by  the  compiler  is  usually  restricted  to  maintaining  execution  frequency  counts 


♦ As  can  be  done  in  several  contemporary  systems. 
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for  each  straightlme  segment  of  code,  since  collecting  more  extensive  information  this  way 
would  make  code  size  prohibitive.  Hence  we  again  have  the  problem  of  reconstructing 
sequences  of  events.  Considerable  analysis  is  needed  to  obtain  detailed  information  on  the 
ISP  level  behavior  of  the  program,  since  the  primary  data  relates  to  the  language  level.  We 
are  furthermore  restricted  to  analyzing  programs  wiitten  in  languages  that  have  this  feature 
in  their  compiler  (or  a suitable  preprocessor),  and  v hich  are  available  for  recompilation.  It 
a, so  disturbs  locality  aspects  of  the  program  execution.  It  is,  however,  more  accurate  than 
sampling,  since  we  are  guaranteed  that  all  executed  parts  of  the  code  are  represented  in  the 
results  in  proportion  to  their  execution  frequency. 

We  chose  to  run  the  subject  program  using  an  nterpreter  for  the  ISP  under  investigation, 
and  collected  information  on  each  instruction  as  it  was  interpreted.  This  method  is  usually 
ca^ec^  instruction  tracing,  or  just  tracing.  The  information  was,  in  our  case,  written  on 
magnetic  tape.  This  method  allows  one  to  study  not  only  the  instruction  stream  as  seen  by 
the  processor,  including  the  path  taken  through  sequences  of  programmed  jumps,  but  also  to 
follow  operand  and  index  values,  indirect  address  chains  etc.,  if  so  desired. 

Also,  tracing  is  language  and  compiler  independent.  It  can  be  applied  to  any  subject  program 
that  can  be  brought  into  the  format  acceptable  to  the  interpreter.  In  many  cases  (as  in  ours) 
the  interpreter  will  be  a relocatable  module  running  on  its  own  ISP,  which  will  then  accept 
the  standard  relocatable  format  for  the  subject  program.  For  a microprogrammed  processor, 
the  microprogram  may  be  extended  to  output  the  information  desired  (See  page  16). 

A further  advantage  is  that  analysis  is  naturally  separate  from  the  data  collection.  Provided 
a rich  enough  trace  is  written,  new  types  of  analyses  can  be  performed  at  any  time  without 
having  to  retrace  the  subject  program.  Since  writing  the  trace  is  cheap  compared  to 
analysing  it,  this  may  at  first  sight  seem  to  be  of  little  value.  It  does,  however,  guarantee 
that  the  results  of  different  analyses  are  consistent  and  independent  of  changes  in  the 
program  traced,  the  compiler  compiling  it,  and  of  random  environmental  influences. 

In  terms  of  computer  resources  needed  to  apply  the  methods,  tracing  is  probably  more 
costly  than  the  others.  Tracing  a program  using  our  current  interpreter*  increases  running 
time  by  a factor  of  about  60,  and  the  analysis  programs  are  slow  This  is,  however,  of  little 
importance.  As  will  be  seen,  a considerable  amount  of  detailed  information  can  be  obtained 
at  a cost  which  is  not  prohibitive,  and  the  writing  of  the  analysis  programs  is  straightforward 
compared  to  what  it  would  be  with  the  other  methods,  to  obtain  similarly  detailed  information. 


Interpreting  the  PDP-10  on  the  PDP-10 


INTRODUCTION 


8 


To  have  sufficiently  detailed  information,  we  wrote  at  least  4 words  of  trace  for  each 
instruction  executed.  These  were:  The  instruction  word,  the  program  counter  and  effective 
address,  the  contents  of  the  accumulator  and  of  the  effective  address.  If  indirection  or  byte 
access  was  used,  two  further  words  were  written  for  each  level  of  indirection,  containing  the 
address  and  contents  of  the  bytepcinter  or  indirect  word.  Writing  at  556  bpi  and  blocking 
1000  words  to  a tape  record,  this  allowed  us  to  trace  about  600  000  instructions  on  a 2400 
ft.  reel  of  tape.  This  corresponds  to  1.5  - 2 seconds  of  CPU  (PDP-10/KA10)  time  when 
executed  at  full  speed. 

Most  of  our  methods  use  only  the  instruction  word.  Hence  time  could  be  saved  both  while 
tracing  and  analyzing,  by  omitting  the  other  information  in  the  trace.  This  would  also  permit 
more  information  to  be  written  on  each  tape.  In  the  interest  of  generality,  however,  we  used 
the  approach  stated. 

An  alternative  to  instruction  by  instruction  tracing  is  the  jump  trace  described  by  Alexander 
[AleW72],  (see  page  14).  With  this  tracing  method  information  is  written  to  the  trace  only 
at  instructions  which  change  the  program  counter.  In  between  such  points  the  program  runs 
at  full  speed.  This  method  is  fast,  but  information  on  operands  and  register  contents 
between  tabulation  points  is  lost.  To  fully  realize  the  gain  in  speed,  the  compiler  should 
know  about  the  tracer  and  insert  appropriate  instructions  to  call  it.  Analysis  is  simplified  if 
the  compiler  also  outputs  a file  of  descriptions  of  each  straightline  segmen*  of  code.  This 
dependence  on  the  compiler  restricts  the  set  of  subject  programs  that  ran  be  analyzed, 
increases  code  size  and  disturbs  locality,  as  discussed  above. 


1.3  Restrictions  in  domain 

We  will  restrict  ourselves  to  traces  obtained  by  executing  single  programs  on  an  interpreter 
for  the  ISP  to  be  evaluated.  This  means  that  we  bar  ourselves  from  studying  problems 
related  to  interrupt  handling,  detailed  I/O  management,  multiprogramming  and  other  operating 
system  issues.  On  the  other  hand  it  allows  us  to  concentrate  on  the  behavior  of  one  single 
program  during  a continuous  span  of  time,  without  being  disturbed  by  interference  from 
other  programs.  This  permits  a study  of  the  local  behavior  of  the  subject  program  to  any 
desired  level  of  detail.  From  this  point  of  view  the  invisibility  of  interrupts  is  a strength 
rather  than  a restriction.  Also,  a change  in  the  execution  speed  of  an  operating  system  will 
imply  s change  in  the  behaviour  of  its  environment.  Hence  in  studies  of  operating  system 
behav'Our  one  should  restrict  oneself  to  information  that  can  be  collected  on  the  fly. 
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A further  advantage  is  that  the  trace  is  reproducible  and  free  from  random  perturbations 
caused  by  interrupts  etc.  This  is  not  strictly  true  for  programs  that  use  shared  resources 
(such  as  primary  memory  dynamically  allocated  to  users)  or  resources  that  operate  in  parallel 
to  the  traced  program.  In  such  cases  different  code  might  be  executed  depending  on 
resource  status. 

Although  'v>sst  of  our  methods  are  applicable  with  minor  modifications  to  most  ISPs,  we  focus 
our  attention  on  ISPs  with  a general  register  structure.  We  take  this  term  in  a wide  sense, 
meaning  roughly  that  a sizeable  repertoire  of  operations  is  available  uniformly  over  a vector 
of  4,  8 or  more  registers.  Another  characteristic  is  that  the  registers  can  be  addressed  from 
more  than  one  field  of  the  instruction  word*.  (See  also  Chapter  4).  Limiting  cases  are  2 or 
3 address  machines  on  one  hand  and  one  address  machines  with  no  index  registers  on  the 
other;  we  do  not,  however,  consider  these. 

Our  experimental  results  are  from  the  PDP-10,  which  has  a vector  of  16  extremely  general 
registers,  and  a very  general  instruction  set,  particularly  for  control  operations  (a  rich  set  of 
skips  and  jumps,  several  forms  of  subroutine  jumps  etc.).  Hence  this  ISP  is  a good  starting 
point  for  detection  of  unnecessary  features.  However,  as  will  be  seen,  we  have  also  been 
able  to  detect  some  deficiencies  of  this  ISP  that  are  not  due  to  unnecessary  generality. 


1.4  Related  work 

Studies  of  frequency  counts  of  instruction  executions  have  been  described  by  several 
authors.  The  best  known  is  the  Gibson  mix,  developed  by  Jack  C.  Gibson  at  IBM  in  1959. 
Gibson  divided  the  instructions  of  the  IBM  704  and  650  into  13  classes  and  counted  how 
many  instructions  were  executed  from  each  class.  His  sample  size  was  17  programs, 
approximately  9 million  instructions.  The  results  are  described  in  [GibJ70];  we  tabulate  them 
in  Figure  5-3. 

Gonter  [GonR69]  has  compared  the  Gibson  mix  and  the  UMASS  mix  »♦,  using  essentially  the 
same  classification  and  tracing  15  million  instructions  on  the  CDC  3600.  His  results  correlate 
well  with  Gibson’s;  they  are  tabulated  in  Figure  5-3. 


* Accumulator  field,  index  field,  memory  address  field,  base  register  field  etc. 
♦♦  UMASS  - University  of  Massachusetts 


INTRODUCTION 


10 


The  substance  of  these  results  is  that  LOADs  and  STORF.s  account  for  about  307,  of  the 
instructions  executed,  branches  for  167.  to  387,  index  manipulations  137  to  187,  arithmetic  37 
to  197.  The  results  depond  both  on  the  ISP  and  the  subject  set. 

Other  similar  mixes  and  experiments  are  reported  by  Arbuckle  [ArbR66],  Conners,  Mercer 
and  Sorlini  [ConW70],  Raichelson  and  Collins  [RaiE66],  and  Herbst,  Metropolis  and  Wells 
[HerE55].  The  latter  is  the  earliest  report  known  to  the  author. 

The  emphasis  of  the  above  studies  was  mostly  on  evaluation  of  the  raw  processing  capacity 
of  the  central  processor.  Little  emphasis  was  made  on  improvements  in  the  instruction 
repertoire  or  central  processor  structure. 

Foster,  Gonter  and  Riseman,  [FosC71a]  have  gone  one  step  further,  by  starting  to  investigate 
the  effects  of  reducing  the  instruction  set.  They  report  their  experience  with  two  measures 
of  instruction  set  utilization.  Both  of  these  measures  are  equally  applicable  to  static  and 
dynamic  instruction  counts.  The  static  measures  give  an  estimate  of  the  space  cost  (Section 
2.3)  and  the  dynamic  measures  estimate  the  time  cost  (Section  2.2)  associated  v ith 
using  the  instruction  set.  The  examples  of  [FosC71a]  use  the  CDC  3600.  Our  use  of  ^e«e 
measures  is  described  in  Section  5.1. 

The  first  of  their  measures  is  the  undiluted  information-theoretic  measure  of  information 
content: 

T 

I - - I Pi  * log2(pi) 
i«l 

where 

Pi  is  the  probability  of  using  the  i’th  opcode 
T is  the  total  number  of  different  opcodes 
log2  is  the  logarithm  base  2 

Intuitively,  the  interpretation  of  I is  the  average  number  of  bits  of  information  conveyed  by 
each  opcode.  The  value  of  this  measure  is  doubtful,  particularly  with  a fixed  wordlength, 
since  the  space  that  could  be  saved  in  each  instruction  word  by  using  the  encoding  depends 
on  the  frequency  of  occurrence  of  the  instruction  in  question,  and  has  no  relation  to  its  need 
for  operand  addressing  capability  etc.  Furthermore,  optimal  encoding  with  respect  to  it 
implies  variable  length  encoding  of  the  opcodes  and  a correspondingly  more  complicated 
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The  other  measure  they  propose  is  a function  computed  as  follows:  Order  the  operation 
cedes  by  frequency  of  occurrence.  Let  Cj  be  the  number  of  occurrences  of  the  i’th  opcode 
in  this  ordering,  (Cj  > C„i  for  all  a Let  P be  the  total  number  of  instructions  in  the  sample, 
and  T the  number  of  different  opcodes,  as  before.  The  FGR  function  is  then  computed  as: 


r'GR(N)  - 1 - 1/P  E C, 

i»t 


(1  < N < T) 


This  function  measures  the  effort  necessary  to  recode  or  run  the  original  program  on  a 
central  processor  with  a smaller  instruction  set.  Indeed  FGR(N)  is  that  fraction  of  the 
instructions  which  would  have  to  be  recoded  (static)  or  interpreted  (dynamic),  were  the 
instruction  set  reduced  to  the  N most  commonly  occurring  instructions.  For  some  of  these 
the  recoding  might  be  impossible,  this  is  not  taken  into  account. 

Substituting  execution  times  for  C,  and  P above,  and  ordering  the  C,  accordingly,  we  obtain  a 
measure  of  the  fraction  of  execution  time  accounted  for  by  the  omitted  instructions,  in  this 
case  the  least  timeconsuming  ones. 

These  measures  were  used  on  a set  Of  CDC  3600  programs.  In  the  dynamic  case  the 
suboperation  field  of  the  opcodes  was  disregarded.  Also,  a different  sample  was  used  for 
the  static  results  than  for  the  dynamic  ones.  The  static  I varied  from  3.59  to  5.36  for  the 
different  programs,  with  a theoretical  maximum  of  7.16.  The  dynamic  I varied  from  3.94  to 
4.64,  with  a theoretical  maximum  of  6.00.  FGR(32)  varied  from  0 to  about  0.2  in  the  static 
case,  and  from  1 to  0.06  in  the  dynamic  case.  This  shows  that  a reduction  of  the  instruction 
set  to  32  instructions  would  cat  ;e  some  increase  in  program  space,  but  that  the  instructions 
that  must  be  interpreted  are  ones  that  are  executed  rarely. 


A related  study  is  by  Foster  and  Gor.ter  [FosC71b].  They  investigated  the  effect  of 
interpreting  opcodes  differently  depending  on  the  recent  history  of  the  ISP.  Thus  on  a one 
accumulator  machine  the  sequence  LOAD  ADD  occurs  often,  LOAD  LOAD  hardly  ever.  Hence 
the  LOAD  and  ADD  instructions  might  use  the  same  encoding  in  the  instruction  word,  provided 
the  LOAD  instruction  changes  the  state  of  the  decoder.  A "set  state"  instruction  provides  the 
necessary  escape  mechanism.  The  intended  application  is  to  combine  a large  instruction  set 


* An  approximation  to  this  encoding  was  used  with  the  Burroughs  B1700.  See  further 
discussion  on  page  15. 
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with  a small  opcode  field,  thus  freeing  instruction  word  space  for  addressing.  They  verify 
their  idea  by  an  analysis  of  some  CDC  3600  programs. 

The  results  show  that  over  67 1 of  the  instructions  could  be  executed  without  use  of  the 
escape  mechanism,  even  if  the  opcode  field  was  reduced  to  3 bits.  For  a 5 bit  field,  957  of 
the  instructions  could  be  executed  directly.  By  circumventing  some  machine  specif'c 
properties  in  their  data,  the  result  for  3 bits  was  improved  to  747.. 

Riseman  and  Foster  [RisE72]  [FosC72]  have  used  traces  to  study  the  effect  of  data 
dependencies  on  the  execution  speed  of  parallel!  processors.  They  postulate  a machine 
where  only  the  execution  of  the  instructions  take  time;  instruction  fetch  and  dispatch,  and 
data  fetch  and  store,  take  no  time.  Further  there  is  an  infinite  supply  of  registers  and 
functional  units  so  that  no  instruction  is  held  up  for  the  lack  of  hardware.  The  instruction  set 
is  as  for  a CDC  3600,  and  traces  from  this  machine  were  used  in  their  experiments. 

There  are  two  restrictions  which  prevent  instructions  from  being  executed: 

a)  Their  operands  have  not  yet  been  computed. 

b)  The  exact  instruction  to  execute  can  not  be  determined  until  some  condition  (jump) 
has  been  resolved. 

Restriction  b)  can  be  circumvented  by  assuming  a nondeterministic  processor,  where  both 
paths  of  the  program  are  executed  in  parallell  until  the  condition  is  resolved.  This 
nondeterminacy  can  be  carried  to  infinite  depth,  or  restricted  to  a maximum  of  N unresolved 
conditions. 

The  experiments  show  an  average  speedup  by  a factor  of  1.72  for  N ■ 0,  2.72  for  N ■ 1, 
7.21  for  N - 8,  and  24.4  for  N » 128.  For  infinite  nondeterminacy  (N  = oo)  the  speedup  was 
by  a factor  of  51.2.  Similar  results  were  found  by  Tjaden  and  Flynn  [TjaG71].  The  results 
show  that  conditional  jumps,  and  their  dependency  on  calculated  results,  is  a severe 
restriction  on  execution  speed. 

Several  investigators  have  used  traces  to  study  addressing  patterns,  with  the  object  of 
determining  optimal  design  of  paging  systems  and  cache  memories.  We  mention  Coffman  and 
Varian  [CofE68],  Gibson  [GibD67],  Hatfield  [HatD72],  Kaplan  [KapK71]  (see  below),  Lewis  and 
Yue  [LewP71],  and  Seligman  [$ell.nd]. 

A few  authors  have  described  more  comprehensive  studies  based  on  traces: 
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At  IBM,  Mut  phey  and  Wade  [Mur J70]  used  traces  to  evaluate  the  performance  of  the  IBM 
360/195.  Traces  were  made  of  programs  beiieved  to  be  representative  of  the  195  workload, 
as  they  were  executed  on  other  360  models.  Detailed  studies  were  made  of  the  behavior  of 
these  programs  in  a 195  simulator.  The  emphasis  of  this  study  was  on  design  validation  and 
performance  prediction.  Particular  studies  were  made  of  the  efficiency  of  the  mechanism  for 
parallel  execution  of  different  instructions. 


Winder  at  RCA  [WinR71],  [WmR73].  describes  the  method  of  tracing  used  on  the  RCA  Spectra 
70/45  and  also  in  some  detail  the  various  studies  performed.  These  include  cache  system 
studies  [KapK71],  paging  analysis,  miscellaneous  program  statistics  emphasizing  1/0, 


branching  and  conditions,  indexing,  and  operand  length  for  variable  length  operands.  A 
SIMSCR1PT  simulator  driven  by  the  trace  was  used  to  investigate  architectural  variants  like 

I 

memory  banking,  cache  parameters,  instruction  lookahead,  multiprocessing  etc. 

Wortman  [WorD72]  has  designed  an  experimental  technique  to  evaluate  computer 
architecture,  in  particular  its  suitability  for  particular  programming  languages.  It  is  based  on 
collecting  static  and  dynamic  statistics  on  the  use  of  language  fragments.  Language 
fragments  are  constituents  of  program  code  which  map  into  non-overlapping  segments  of 
object  programs,  and  which  do  not  contain  data  dependent  loops.  As  a case  study  Woriman 
chose  a PL/I  dialect  called  Student  PL,  and  designed  a stack  oriented  architecture  suitable  for 
this  language.  An  interpreter  for  the  architecture  was  written,  and  also  a compiler  to 
translate  Student  PL  programs  into  its  machine  language.  For  his  subject  set  he  chose  about 
1000  small  student  programs  from  an  undergraduate  programming  course.  Three  kinds  of 
statistics  were  observed: 

Source  program  statistics,  essentially  the  number  of  application  of  each  production  during 
syntax  analysis. 

Object  program  statistics,  i.e.  frequencies  of  occurrences  of  the  machine  instructions 
- (language  fragments),  and  pairs  and  triples  of  these  in  the  generated  code. 


Run  time  statistics,  i.e.  frequencies  of  execution  for  the  individual  machine  instructions. 


Based  on  these  statistics  he  made  several  improvements  in  the  instruction  set,  and  found 
reductions  of  about  507  in  each  of  program  storage  space,  data  and  instruction  accesses,  and 
number  of  bits  accessed.  The  most  significant  improvements  were: 


Information  relating  object  instructions  to  source  lines  was  moved  to  secondary  storage. 
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The  data  accessing  method  was  improved. 

An  immediate  type  instruction  was  introduced  to  move  constants  to  the  stack.  (727  of 
the  constants  found  were  integer  constants,  and  98.87.  of  these  could  be  represented  in 
6 bits). 

The  handling  of  conditionals  and  "builtin"  functions  was  improved. 

By  refining  his  language  fragments  Wortman  also  was  able  to  compare  his  machine  design 
with  the  IBM  360  as  a vehicle  for  PL/l. 


Alexander  [AleW72]  has  made  a study  similar  to  Wortmans,  but  for  an  excisting  ISP  (The  IBM 
360)  and  a language  (XPL  [McKW70])  used  mostly  for  compiler  writing.  His  main  goal  was  to 
investigate  how  the  features  of  the  XPL  language  were  used,  and  what  requirements  they 
posed  on  the  ISP.  He  presents  statistics  on  source  programs,  object  programs  and  run  time 
behaviour.  These  were  obtained  by  modifying  the  XPL  compiler  (XCOM),  and  by  full  trrcing 
and  jump  tracing.  His  subject  set  was  slightly  different  for  the  different  analyses,  it 
consisted  of  XCOM,  several  compilers  written  for  undergraduate  and  graduate  courses,  -'nd 
his  own  analysis  programs.  His  results  can  be  summarized  as: 


Floating  point  and  decimal  arithmetic  are  not  used  by  XPL,  this  leaves  91  instructions  that 
can  potentially  be  generated  by  XCOM.  Of  these  only  47  were  actually  generated.  10  of 
these  account  for  847.  of  the  instructions  executed.  The  10  most  generated  instructions 
account  for  857  of  the  total  number  of  generated  instructions,  this  set  intersects  the 
previous  set  of  10  by  9 instructions. 

XCOM  allocates  3 registers  as  accumulators.  The  first  of  these  was  named  in  477  of  the 
accumulator  references  (as  opposed  to  index  or  base  register  references).  The  second 
was  named  in  267,  and  the  third  in  117  of  the  accumulator  references.  Hence 
expressions  rarely  are  complicated  enough  that  many  accumulators  are  needed.  The 
register  used  for  ind(  xed  access  accounts  for  117  of  the  accumulator  references. 

427  of  the  references  to  index  or  base  registers  were  to  register  0,  i.e.  no  indexing  or 
base  was  used.  That  is:  almost  half  of  the  addresses  were  unmodified.  87  were  used  in 
array  accessing,  317  were  used  to  access  statically  allocated  data  (as  base).  7 fixed 
registers  were  allocated  by  XCOM  for  this  latter  purpose. 
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Most  of  the  branches  were  to  locations  close  to  the  branching  instruction.  Alexander 
suggests  that  the  branch  instruction  of  the  360  could  be  modified  to  address  relative  to 
the  current  program  counter,  and  the  4 bits  now  used  for  base  register  addressing  could 
instead  be  used  to  augment  the  written  address  field,  to  make  it  16  bits  long.  Such  an 
instruction  would  suffice  for  997.  of  all  branches.  5K  bytes  of  load  instructions  would  be 
eliminated,  saving  157.  of  the  program  space. 

If  opcodes  were  conditionally  decoded,  as  proposed  by  Foster  and  Gonter  [FosC71b]  (see 
above),  16.27.  of  the  program  space  could  be  saved  by  an  encoding  of  the  opcode  in  3 
bits.  This  result  pertains  to  one  particular  subject  program. 

Alexander  extensively  compares  his  dynamic  and  static  results,  and  comments  upon  the 
significance  to  constructs  used  or  not  used  within  loops,  and  on  special  properties  of  the  XPL 
language  and  system.  He  also  advocates  the  use  of  program  profiles,  and  in  this  context 
points  Out  the  need  for  string  manipulating  instructions  in  compilers. 

Studies  of  architecture  based  on  tracing  have  probably  also  been  performed  by  computer 
manufacturers.  Such  work  is  usually  considered  "company  private",  and  is  not  published,  but 
a few  have  been:  The  work  by  Murphey  and  Wade  [MurJ70],  and  that  by  Connors,  Mercer 
and  Sorlini  [ConW70],  all  at  IBM,  and  also  that  by  Winder  [WinR71],  [WinR73]  and  Kaplan 
[KapK71]  at  RCA.  All  of  these  are  mentioned  above. 

A particularly  interesting  machine  design  is  the  Burroughs  B1700,  [WilW72a],  [WilW72b].  In 
this  system  microcoded  interpreters  are  provided  for  several  "S-languages",  each  of  them 
corresponds  roughly  in  level  to  a classical  machine  langjage,  but  is  tailored  to  fit  the  needs 
of  a particular  higher  level  language.  The  microprograms  address  memory  by  bit  position, 
and  desired  access  width  is  supplied  on  each  access.  Hence  the  processor  gains  efficiency 
primarily  in  two  ways: 

a)  Time  efficiency  is  gained  by  using  an  S-language  tailored  to  the  application  (higher 
level  language),  hence  having  essentially  the  "right  instructions"  for  the  task  at  hand. 
Each  instruction  is  usually  more  complex  than  most  classical  machine  instructions. 

b)  Space  efficiency  is  gained  by  encoding  the  S-language  instructions  in  different 
formats  depending  on  the  need  for  space  to  represent  the  feature  in  question,  and  its 
frequency  of  use. 
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One  such  S-language  is  SDL,  particularly  suited  to  systems  programming.  The  opcodes  of  this 
language  are  of  3 lengths,  4,  6 or  10  bits,  whereas  a fixed  length  encoding  would  require  8 
bits.  By  using  this  encoding,  space  is  gained  at  the  cost  of  an  increased  decoding  time.  The 
two  encodings  men*i.ned  were  compared  to  the  Huffman  encoding,  which  is  space  optimal. 
The  following  results  were  found: 

Space  saved:  Time  lost: 

07  07. 

397  2.67 

437  17.27 

Hence  the  chosen  encoding  is  almost  as  space  efficient  as  the  Huffman  encoding,  and  almost 
as  time  efficient  as  the  fixed  field  encoding. 


Encoding: 

Fixed  8 bits 
SDL  4,  6,  10  bits 
Huffman  code 


Similarly  the  SDL  addresses  were  encoded  using  8 different  formats  and  a 3 bit  field  to 
distinguish  them,  giving  a 387  saving  in  memory  space  compared  to  the  4 byte  addresses 
needed  on  a byte  oriented  machine  with  fixed  length  addresses  spanning  the  same  address 
space. 

For  FORTRAN  and  COBOL  programs,  using  the  appropriate  S-language,  the  reduction  in 
program  space  was  found  to  be  407  - 707  over  the  IBM  360  and  the  Burroughs  B3500. 

Furthermore,  access  width  can  be  a parameter  to  the  S-language  interpreter,  allowing  the 
compiler  to  generate  code  more  suited  to  the  actual  problem  and  also  making  possible  a 
planned  "Dial  a precision  FORTRAN". 

Wirth  ([WirN"’2])  has  given  a qualitative  review  of  a particular  ISP,  the  CDC  6000  series,  from 
the  viewpoint  of  programming  ease  and  error  detection.  In  particular  he  points  out 
deficiencies  of  the  data  representations  and  operator  implementations  that  make  the 
detection  of  errors,  and  hence  the  guarantee  of  a correct  result,  impossible  or  at  best 
uneconomical.  He  also  points  out  the  lack  of  an  instruction  for  calling  reentrant  programs. 
His  experience  is  from  the  implementation  of  PASCAL  [WirN71]  for  this  ISP,  but  his 
arguments  apply  equally  well  to  all  language  implementations  where  security  and  error 
detection  is  a v'  sign  goal,  and  to  all  uses  of  recursion  or  reentrancy. 


For  microprogrammed  processors,  the  microprocrammed  interpreter  can  be  extended  to 
collect  execution  time  data.  This  approach  is  advocated  by  Saal  and  Shustek  [SaaH72].  For 
simple  types  of  data  this  allows  the  subject  program  to  run  at  almost  full  speed.  However, 
full  tracing  by  microprogram  will  be  limited  in  speed  by  the  device  recording  the  trace. 
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Since  analysis  time  is  considerably  larger  than  trace  time  in  any  case,  the  advantage  is 
doubtful.  The  authors  discuss  various  aspects  of  implementing  such  techniques,  and  present 
data  relating  to  opcode  utilization  and  frequent  instruction  pairs.  These  results  differ  little 
from  those  of  Alexander  [AleW72]  and  Foster  et.  al.  [FosC71b]. 


We  have  previously  identified  the  most  important  dimensions  of  ISP  architecture  to  be: 
register  structure,  data  ypes  and  operators,  control  operators  and  structures,  and  address 
calculation. 

Of  these,  the  operator  dimensions  have  been  relatively  well  explored  in  the  works  cited. 
This  applies  in  particular  to  studies  of  the  utilities  of  existing  operators  and  possibilities  fur 
more  efficient  encodings.  The  problem  of  finding  desirable  but  non  existing  opcodes  has 
been  touched  upon  by  Alexander  and  Wortman,  but  needs  further  work. 


Other  properties  of  control  have  been  partially  explored,  particularly  locality  of  jumps 
(Alexander),  and  the  use  of  test  instructions  and  conditions  (Alexander,  Winder).  Locality 
properties  of  address  streams  have  been  studied  in  connection  with  virtual  memories  and 
caches,  but  the  data  structuring  aspect  is  largely  unexplored.  Register  structure  has  barely 
been  touched  (Alexander). 


1.4.1  Contributions  of  the  thesis 

Our  main  contribution  to  this  field  of  work  is  the  methods  for  register  utility  and  generality. 

We  also  break  new  ground  in  our  work  on  instruction  sequences.  Previously  Alexander  (see 
page  14)  has  presented  dynamic  counts  of  sequences,  but  only  of  length  up  to  3.  Our 
present  program  can  accumulate  counts  for  sequences  of  lengths  up  to  20*.  Our  pruning 
heuristics  make  the  accumulation  of  counts  for  sequences  of  this  lenght  economically  feasible. 
In  fact  we  point  out  an  improvement  to  our  algorithm  which  will  make  the  accumulation  of 
sequences  of  this  length  and  longer  much  more  efficient  than  with  our  present  program. 

Finally  our  approach  is  general  (see  Section  1.2.1),  we  present  results  spanning  algorithms 


1 This  limit  was  arbitrarily  set  because  we  believed  longer  sequences  would  not  be  of 
interest.  The  method  can  handle  sequences  of  arbitrary  length. 
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coded  in  several  languages  and  by  different  programmers,  and  we  try  to  evaluate  the 
influence  of  these  factors  on  our  results.  Earlier  work  has  in  some  cases  ([AleW72)]  and 
[WorD72])  been  confined  by  methodology  and  other  considerations  to  one  language.  In  other 
cases  the  selection  of  subject  programs  and  goals  have  been  more  restricted. 

We  can  not  leave  this  section  without  mentioning  the  influence  on  our  work  by  that  of  Foster, 
Gonter  and  Riseman  [FosC71a].  The  FGR  function  introduces  some  very  simple  and  relevant 
measures  ot  the  utility  of  ISP  features,  namely  the  change  in  execution  time  or  instruction 
count  resulting  from  a change  in  the  ISP.  Foster  et.  al.  applied  this  idea  to  opcode  utilisation. 
Much  of  our  work  consists  of  applying  it  to  other  features  of  ISP  architecture. 


CHAPTER  2 


COSTS 


In  this  chapter  we  discuss  the  various  basic  cost  measures  pertaining  to  ISP  features.  After 
some  introductory  remarks  we  list  four  types  of  cost.  For  each  of  these  we  discuss  its 
definition  and  other  relevant  issues,  such  as  the  way  or  ways  we  measure  it  and  their 
related  inaccuracies,  other  ways  to  measure  it,  and  its  relation  to  the  other  types  of  costs. 
As  a necessary  introduction  to  this  discussion  we  will  make  some  comments  on  the  instruction 
word  and  issues  related  to  it.  This  follows  after  the  introductory  remarks. 

The  four  types  of  cost  we  propose  are  general.  We  believe  they  apply  to  all  ISP  structures, 
not  only  those  with  general  registers.  The  units  in  which  we  measure  might,  however,  vary 
with  the  structure  of  the  processor  in  question.  This  is  true  even  within  the  class  of  general 
register  processors. 

Computer  resources  are  allocated  in  units  of  space  and  time:  space  in  memory  units,  time  in 
processing,  control  and  communication  units.  Since  some  memory  must  be  in  use  whenever 
the  central  processor  is  in  use,  the  product  of  space  and  time  is  a relevant  measure  of  cost 
for  the  usage  of  memory  units  and  time  alone  for  other  units.  These  are  the  basic  units  for 
measuring  the  costs  incurred  by  running  the  program  on  the  machine.  Relating  these  to 
economic  terms  requires  knowledge  of  the  actual  cost  of  the  units  of  the  computer,  and  of 
the  operating  expenses.  In  addition,  the  cost  of  producing  the  program  (designing,  coding 
and  debugging),  in  terms  of  human  effort  and  machine  resources,  depends  on  a good  ISP 
design  and  may  be  highly  relevant. 

Since  we  are  concerned  with  the  ISP  we  will  disregard  costs  related  to  secondary  memory 
except  insofar  as  they  are  expressed  by  the  costs  relating  to  primary  memory.  Similarly  the 
basic  instructions  for  1/0  are  not  part  of  the  ISP  seen  by  the  user  (See  Section  1.3),  hence 
we  also  disregard  I/O  costs  and  the  costs  of  control  and  communication  units.  Thr  latter  are 
to  some  extent  expressed  by  the  cost  of  the  central  processor.  The  time  cost  (see  below) 
associated  w th  I/O  and  secondary  memory  usage  is  considered  independent  of  and  irrelevant 
to  ISP  architecture,  and  will  be  disregarded  except  where  explicitly  noted  otherwise. 


COSTS 


20 


Motivated  by  the  above  remarks  and  by  further  discussion  below,  we  will  regard  the  costs  of 
having  or  lacking  a given  feature  in  an  ISP  as  falling  in  4 basic  categories: 

1)  Execution  time  (time  cost) 

2)  Memory  space  (space  cost) 

3)  Programming  effort  (programming  cost) 

4)  Hardware  to  implement  e ‘eature  (hardware  cost). 

This  list  is  roughly  in  order  of  importance.  Our  methods  will  be  almost  solely  concerned  with 
time  cost,  but  the  others  will  be  kept  in  mind  and  mentioned  when  relevant. 

The  weighing  and  trading  off  of  these  costs  is  the  concern  of  the  ISP  designer  and  falls 
outside  the  scope  of  this  thesis.  Our  goal  is  to  provide  methods  for  computing  them,  and  in 
particular  the  time  cost,  exactly  or  approximately,  as  seems  relevant  and  possible  for  the 
feature  in  question. 


2.1  The  role  of  the  instruction  word 

The  instruction  word  occupies  a central  position  in  any  ISP  design,  bping  the  quantum  in 
terms  Of  which  the  ISP  forces  the  programmer  to  express  his  algorithm.  Hence  it  brings 
together  all  the  issues  of  ISP  design  and  must  be  a focal  point  for  our  research. 

Some  different  views  on  how  the  instruction  word  can  be  organized  are  represented  by  the 
CDC  6000  series,  the  PDP-10  and  the  IBM  360  series.  The  6000s  have  60  bit  words  and 
about  70  different  user  instructions  packed  2 to  4 to  a word;  the  PDP-10  has  36  bit  words 
and  about  420  different  user  instructions  each  filling  one  word;  the  360  has  about  130  user 
instructions  of  16,  32  or  48  bits,  .ne  major  data  formats  are  16  or  32  bits,  memory  fetch 
width  is  8,  16,  32  or  64  bits  depending  on  the  model.  Good  performance  is  attempted  in  the 
first  case  by  fast  instruction  issuance,  in  the  others  by  powerful  instruction  sets. 

We  now  present  some  of  tne  issues  relating  to  the  instruction  word  organization  in  a top 
down  order,  neither  implying  any  order  of  importance  nor  a sequence  in  which  design 
decisions  should  be  made  As  is  exemplified  by  the  above  designs,  there  is  no  generally 
accepted  way  of  resolving  these  issues.  In  fact,  the  solution  is  often  strongly  influenced  by 
historical  or  marketing  constraints,  or  other  external  considerations.  In  particular  the 
introduction  of  the  8 bit  byte  by  IBM  with  the  360  series  in  1964  has  had  a standardizing 
influence. 
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The  first  issue  is  the  size  oi  the  instruction  word.  The  cost  and  power  ranges,  and  in 
particular  the  addressing  space,  planned  for  a new  processor,  will  to  a large  extent  influence 
what  features  need  to  be  accommodated  in  the  instruction  word.  Its  size  is  also  influenced 
hy  issues  not  relating  to  the  instruction  word  as  such,  particularly  the  desired  accuracy  of 
the  arithmetic  and  other  data  types  and  the  memory  fetch  width. 

A short  instruction  word  implies  at  first  sight  a small  space  cost.  Similarly  a short  instruction 
word  may  imply  reduced  instruction  fetch  time,  particularly  if  more  than  one  instruction  is 
packed  into  one  memory  word.  A slightly  shorter  decoding  time  might  also  result  from  a 
short  instruction  word.  However,  the  advantage  of  a short  instruction  word  turns  into  a 
disadvantage  when  the  set  of  available  features  becomes  too  poor.  At  some  point  commonly 
used  operations  have  to  be  expressed  as  a sequence  of  two  or  more  instructions,  and  both 
time  cost  and  space  cost  rise’.  Obviously  there  is  an  optimum  for  both  space  and  time,  not 
necessarily  the  same,  and  probably  not  very  well  defined”.  There  is  also  an  associated 
hardware  cost,  usually  increasing  with  instruction  word  size. 

To  simplify  the  discussion  we  will  from  now  on  assume  that  the  word  length  is  given,  and  one 
and  the  same  for  instructions  and  for  integer  and  real  operands.  On  this  assumption  we 
consider  the  problem  of  which  of  the  desirable  features  can  be  represented  within  the 
instruction  word.  This  represents  little  limitation  on  the  scope  of  our  methods.  Data 
obtained  by  them  are  certainly  val-d  arguments  in  discussions  of  instruction  word  size,  and 
the  changes  in  the  methods  needed  to  handle  more  esoteric  cases  of  mixed  wordlengths  are 

mostly  trivial. 

The  next  issue  brought  up  is  the  division  of  the  instruction  word  into  fields.  Each  field 
represents  some  capability  of  the  ISP,  such  as  operator  selection,  addressing  mode  selection, 
operand  selection  etc.  Which  capabilities  to  include  is  an  open  question,  indirect  addressing 
and  base  register  addressing  being  cases  in  poir.t. 


Having  decided  which  capabilities  are  wanted,  there  is  the  question  of  the  size  of  each  field, 
and  which  functions  to  include  for  each  capability. 

Knowing  the  relative  values  of  the  possible  functions  in  a capability  and  given  its  field  size, 


t A similar  argument  holds  for  data  word  lengths,  in  that  case  it  is  the  need  for  accuracy 

which  pushes  towards  longer  words. 

tt  in  particular  this  depends  on  the  application. 
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one  may  select  a set  of  functions  for  it.  Some  idea  of  the  relative  merits  of  functions  from 
different  capabilities  is  necessary  to  decide  on  the  field  sizes,  or  on  the  desirability  of  having 
a given  capability  at  all.  Note  that  a function  becomes  particularly  expensive  when  the  field 
capacity*  of  that  capability  is  about  to  be  exhausted.  This  means  trading  it  against  a 
considerable  reduction  in  some  other  capability  or  against  an  increase  in  the  instruction  word 
size.  In  fact,  the  cost  paid  is  usually  that  of  doubling**  the  n Tiber  of  functions.  Once  this 
cost  has  been  paid,  however,  functions  that  would  not  otherwise  have  been  considered,  can 
be  implemented  cheaply. 

The  goal  of  our  methods  is  to  estimate  the  relative  costs  and  usefulness  of  capabilities  and 
their  functions.  They  thus  give  exactly  the  Kind  of  information  that  sheds  light  on  the 
problems  of  how  to  allocate  the  mstruc  ion  word  space  to  capabilities  and  functions. 

The  allocation  of  functions  to  capabilities  is  not  unique.  Also  structural  changes  in  one 
capability  may  imply  significant  changes  in  another.  One  example  is  provided  by  two  address 
ISPs.  When  both  operands  can  be  accessed  by  a full  address,  the  traditional  LOAD  and 
STORE  instructions  are  subsumed  by  a MOVE  instruction.  Another  example  is  the  handling  of 
I/O  devices.  Commonly  there  are  instructions  like  "connect",  "send  function"  and  "read 
status"  to  control  these.  On  the  PDP-11  this  is  not  so  The  relevant  registers  of  the 
external  devices  have  been  allocated  functions  in  the  addressing  capability  and  the  above 
instructions  are  subsumed  under  the  MOVE  instruction.  Yet  another  example  is  provided  by 
general  registers.  If  these  are  part  of  the  addressing  space,  register  to  register  functions 
are  not  needed  in  the  operation  capability,  they  are  subsumed  under  the  memory  to  register 
functions. 


2.2  Time  cost 

The  primary  time  cost  is  the  time  the  central  processor  spends  executing  tho  program.  For 
reasons  explained  in  Section  1.3  the  primary  time  cost  excludes  time  spt  it  in  interrupt 
handling,  whether  the  program’s  own  or  others’.  Unless  specifically  mentioned,  the  term  time. 
cost  is  used  to  mean  primary  time  cost. 


* Usually  some  power  of  two. 

**  Assuming  a binary  instruction  word. 
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Execution  time  can  not  be  measured  directly  by  our  methods.  We  propose  three 
approximations: 

One  is  the  instruction  count,  i.e.  the  number  of  instructions  executed.  This  suffers  from  the 
inaccuracy  caused  by  assuming  that  all  instructions  execute  in  the  same  time.  This  is  further 
discussed  below.  Modifications  could  be  made  depending  on  addresing  mode  (particularly 
indirection)  and  other  features.  This  was  not  done  in  our  case.  The  major  advantage  of  this 
measure  is  the  ease  with  which  it  is  computed,  and  its  independence  of  technology*  and 
processor  implementation.  The  instruction  count  also  has  another  quality:  In  addition  to 
being  a crude  measure  of  time,  it  is  a precise  measure  of  the  number  of  opportunities  there 
have  been  to  express  something  in  the  program. 

For  many  designs,  the  memory  reference  count  may  be  more  appropriate.  The  PDP-11  is  a 
good  example  of  this,  since  for  the  same  data  operation  the  number  of  memory  accesses 
varies  depending  on  addressing  mode.  In  case  of  the  ADD  instruction  the  number  of  memory 
accesses  may  thus  vary  between  1 and  7. 

If  there  is  no  overlapping  between  instruction  executions,  a more  accurate  measure  is  the 
computed  time,  that  is  the  sum  of  the  execution  times  of  all  instructions  executed.  Even  this 
is  inaccurate  since  execution  times  of  many  instructions  depend  on  operand  values  or  lengths 
and  also  on  hardware,  like  primary  memory  cycle  time.  The  latter  may  vary  even  within  the 
same  run  if  the  job  is  swapped.  However,  the  time  obtained  in  this  way  is  probably  as 
accurate  as  that  used  for  accounting  and  other  purposes  by  operating  systems,  where 
operating  system  overhead  and  interrupt  handling  on  behalf  of  other  jobs  often  is  a major 
source  of  errors. 

We  may  get  an  indication  of  the  inaccuracy  of  the  instruction  count  as  a measure  of  the  time 
cost  by  comparing  it  with  the  computed  time.  This  is  done  in  Figure  3-4,  which  displays 
the  average  instruction  execution  rate  for  our  subject  set  in  uni*s  of  thousand  instructions 
per  second  of  computed  time  (kips  - kilo  instructions  per  second).  As  the  table  shows,  this 
rate  varies  from  210  to  417  kips,  with  an  average  of  324  kips  and  a standard  deviation  of 
63.  Hence  the  instruction  count  may  vary  by  a factor  of  2 for  programs  of  the  same 


* A faster  floating  point  unit  would  make  a great  difference  in  the  execution  time  for  many 
programs,  but  not  in  the  instruction  count.  In  one  of  our  subject  programs  (Aitken  E,  see 
Section  3.2.2),  237.  of  the  executed  instructions,  consuming  547,  of  the  computed  time, 
are  for  floating  point  arithmetic. 
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computed  time.  Assuming  the  computed  time  to  be  close  to  correct,  we  may  conclude  that 
the  instruction  count  is  not  overly  accurate  as  a measure  of  time.  We  still  use  it,  however, 

for  the  stated  reasons. 

For  a central  processor  where  there  is  overlap  between  instruction  executions  the 
instruction  count  may  be  sufficient.  Alternatively  an  interpreter  for  the  instruction 
dispatching  mechanism  may  be  programmed  and  an  appropriate  version  of  computed  time 
obtained.  The  choice  depends  on  whether  one  wants  to  evaluate  the  instruction  set  as  such, 
or  the  processor  that  executes  it.  Such  an  interpreter  might  introduce  additional 

inaccuracies. 

The  relations  between  the  time  and  space  costs  through  the  instruction  word  are  described 
in  Section  2.1.  The  tradeott  discussed  there  applies  to  all  capabilities  and  functions  of  the 
instruction  word,  and  also  to  the  implied  data  types. 

The  secondary  timft  CQSl  the  time  sPent  in  °Peratin&  s/^ems  functions  on  behalf  of  the 
running  job.  This  can  be  measured  by  clock  or  by  using  operating  system  routines  as  the 
subject  programs  of  the  analysis.  This  cost  is  influenced  by  the  space  cost  as  discussed  in 

Section  2.3. 


2.3  Space  cost 

This  is  the  cost  of  the  primary  memory  that  a program  occupies  for  code  and  data  (static  and 
dynamic).  The  importance  of  this  cost  follows  from  the  relatively  high  cost  of  primary 
memory,  which  is  commonly  an  expensive  part  of  a computer  installation4. 

Contributing  to  the  space  cost  is  instruction  space  and  data  space.  Given  an  application  ooth 
of  these  will  vary  with  the  ISP,  in  particular  with  the  available  data  types  and  their 
operators.  Variations  in  register  structure  and  control  operators  will  influence  program 

space  and  space  for  temporary  storage. 


♦ With  the  current  trend  towards  semiconductor  memories,  the  technology  is  the  same  for  the 
memory  and  the  processor.  Since  the  memory  is  usually  much  larger  (in  gates),  memory  cost 
will  continue  to  be  high  until  another  technology  becomes  economical. 


I* 
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Space  cost  is  best  measured  by  static  methcds  or  by  estimation  based  on  miscellaneous 
assumptions  as  relevant  in  the  particular  case.  The  data  space  for  dynamic  data  structures 
can  not  be  measured  by  static  means.  It  can  be  measured  by  dynamic  methods,  but  we 
present  no  method  for  this  at  the  present  time. 

For  static  methods  one  may  rely  on  the  compiler  in  question  to  produce  the  statistics,  or  a 
special  program  may  analyze  core  images,  relocatable  programs  or  some  similar  general  form 
of  the  program.  The  first  approach  suffers  from  lack  of  generality  as  discussed  in  Section 
1.2.1.  The  second  may  have  inaccuracies  due  to  the  difficulty  of  distinguishing  instruction 
words  from  data,  in  particular  constants  and  descriptors.  This  inaccuracy  depends  on  the 
central  processor  structure,  it  will  be  small  or  nonexistent  On  a central  processor  where  code 
and  data  are  completely  separate,  as  on  the  HP  3000. 

Space  cost  is  measured  in  bits,  alternatively  in  words.  Whenever  we  estimate  this  cost  there 
will  be  inaccuracies  inherent  in  the  particular  assumptions  made.  These  will  be  discussed  in 
each  case. 

Memory  access  width  relates  the  space  and  time  costs  by  forcing  unnecessary  space  to  be 
used  rather  than  increasing  the  time  cost.  Memory  access  width  is  again  influenced  by  the 
amount  of  space  necessary  for  representing  data  types.  Dynamic  methods  may  be  desirable 
here,  to  determine  the  space  necessary  to  represent  the  actual  significance  of  numerical 
operands  (See  Section  5.5). 

Also  space  cost  relates  to  time  cost  through  the  instruction  word  as  discussed  in  Section  2.1. 
For  a computer  with  a dynamic  memory  management  (paging,  overlaying)  there  will  be  an 
associated  secondary  time  cost  for  this  function  which  usually  increases  with  the  space  cost. 
In  a multiprogrammed  situation  there  will  also  be  a relation  to  secondary  time  cost  through 
central  processor  idle  time  whenever  the  program  is  difficult  to  multiprogram.  This  also 
increases  with  the  space  cost. 


2.4  Programming  cost 

This  cost  may  be  broken  down  as  cost  of  design  and  coding,  debugging  and  maintenance. 
Costs  incurred  by  errors  during  production  runs  may  also  be  included.  Each  of  these  is  often 
a significant  fraction  of  the  costs  associated  with  a program.  The  most  important  way  of 
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reducing  the  programming  cost  is  to  write  programs  in  high  level  languages.  However,  for 
efficiency  reasons,  and  in  order  to  gam  access  to  machine  features,  much  coding  still  takes 
place  in  assembly  languages.  Similarly  most  debugging  is  done  by  means  of  assembler 
oriented  debuggers,  or  at  least  requires  good  knowledge  of  the  representation  of  the 
program  in  ISP  terms.  Hence  a good  ISP  architecture  contributes  to  reducing  this  cost  in 
several  ways: 

By  supporting  high  level  languages  and  other  good  programming  mc^odologies.  This 
includes  techniques  for  program  factorization,  like  subroutines,  coroutines  and  separately 
compiled  modules,  which  should  be  well  supported  by  the  ISP.  Also  important  are  natural 
representations  for  a rich  set  of  other  control  operators  and  their  associated  data 
structures. 

By  supporting  program  security.  A program  should  be  protected  against  its  own  errors 
as  well  as  those  of  other  programs.  The  instruction  set  should  not  encourage  the 
programmer  to  make  unnecessary  mistakes,  and  the  ISP  should  permit  inconsistencies  to 
be  detected  during  execution1.  Possible  dynamic  checks  could  be:  consistency  of  data 
types  and  operators,  validity  of  effective  address  with  respect  to  named  data  structure, 
consistency  of  control  operators  and  their  data  etc.  The  standard  techniques  for 
protection  against  other  programs  are  to  a lesser  extent  relevant  to  our  subject. 

By  having  the  right  operators.  That  is:  fewest  possible  operators  should  have  to  be 
fabricated  from  existing  ones.  This  contributes  to  understandability.  For  particular 
languages  or  application  areas  instructions  for  indexing  in  two  dimensions,  parameter 
checking,  etc.  might  be  relevant. 

By  being  clean  and  elegant.  This  means  that  the  capabilities  and  their  functions  should 
be  well  defined  and  conceptually  well  separated  (orthogonal).  There  should  be  few  and 
well  defined  instruction  word  formats.  The  data  types  and  control  operators  should  be 
well  defined,  and  their  representations  should  bo  easily  understandable.  General 
concepts  should  be  preferred  to  special. 

The  methodology  and  elegance  dimensions  of  this  cost  are  currently  not  quantifiable  except 
by  purely  subjective  evaluation.  Personal  biases  and  preferences  will  have  a strong 


♦ Wirth,  [WirN72]  has  stated  the  case  for  this  form  of  security  and  its  dependence  upon  the 
ISP  very  eloquently.  See  Section  1.4. 
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influence.  As  for  the  security  dimension,  the  cost  and  value  of  proposed  checking 
mechanisms  can  be  estimated  using  our  methods  to  obtain  data  on  dynamic  usage.  We  also 
provide  methods  for  evaluating  existing  and  missing  operators,  namely  the  freque  cy  counts 
and  FGR  function  (Section  5.1  through  Section  5.1)  and  the  sequences  (Section  5.2). 

Except  for  the  "right  operators"  dimension,  most  of  the  programming  cost  is  accumulated  over 
features  missing  from  the  ISP.  Introduction  of  new  features,  to  lower  the  programming  cost, 
will  usually  be  at  increased  space,  time  and  hardware  costs.  However,  a generalization  of 
existing  features  will  often  entail  a reduction  of  all  costs. 

We  have  discussed  this  cost  partly  to  point  out  that  security  measures  can  be  bui  t into  the 
ISP  at  some  (often  low)  cost  in  space  and  time,  and  that  our  methods  can  be  used  to  estimate 
these  costs.  We  also  want  to  point  out  that  we  do  not  advocate  rushing  headlong  into  making 
some  improvement  suggested  by  our  methods  to  save  space  or  fime,  without  considering  the 
issues  just  discussed. 


2.5  Hardware  cost 

This  is  the  cost  of  the  hardware  of  the  central  processor  needed  to  implement  a feature. 
Given  the  approximate  computing  power  of  the  processor  and  its  general  structure,  the 
varying  part  is  mostly  a cost  of  electronic  circuitry.  Since  the  cost  of  integrated  circuits  is 
rapidly  falling  and  becoming  a small  fraction  of  the  cost  of  a computer  system,  the  hardware 
cost  is  becoming  less  significant. 


Estimating  the  ha'dware  cost  is  outside  the  scope  of  this  thesis.  As  a general  rule  each 
feature  introduced  into  the  ISP  will  increase  it,  less  so  if  the  new  feature,  or  part  of  it,  is 
subsumed  under  an  already  existing  concept  and  using  existing  hardware.  It  follows  that  an 
increased  hardware  cost  is  usually  the  consequence  of  an  improvement  designed  to  reduce 
the  space  and  time  costs. 

Time  cost  can  be  reduced  by  using  faster  circuits,  thus  increasing  the  hardware  cost.  This  is 
irrelevant  to  the  ISP  architecture.  Hardware  cost  is  independent  of  space  cost,  its  relation  to 
programming  cost  is  discussed  in  Section  2.4. 
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CHAPTER  3 

VALIDATION  STRATEGY 


A major  concern  of  our  research  has  been  to  establish  the  validity  of  the  methods  we  have 
developed.  We  wanted  to  ascertain  that  they  apply  with  more  or  less  equal  generality  to  the 
ISP  structures  outlined  in  Section  1.3  and  to  all  application  areas  where  this  class  of 
processors  is  commonly  used.  We  wanted  to  be  confident  that  the  results  obtained  by  using 
them  reflect  general  requirements  of  programmers,  algorithms,  languages  and  compilers 
rather  than  idiosyncrasies  of  particular  instances  of  such.  Specifically  we  wanted  to  assess 
the  influence  of  each  source  of  variation  on  our  results. 

The  sources  of  variation  can  be  groupea  cc* 

Variation  due  to  algorithm. 

Variation  due  to  programmer. 

Variation  due  to  language  used. 

Variation  due  to  the  particular  implementation  of  that  language  (including  the  operating 
system). 

Variation  due  to  the  ISP. 

One  might  also  want  to  consider  variation  due  to  choice  of  representations,  part.cularly  for 
data  structures.  This  variation  is  closely  related  to  those  due  to  algorithm,  programmer  and 
language,  and  we  do  not  treat  it  as  a separate  source  of  variation  here. 

The  validity  of  the  results  have  been  judged  by  several  criteria: 

The  methods  confirm  already  Known  efficiencies  or  deficiencies  of  the  ISP  considered. 

The  methods  give  new  insight  into  deficiencies  or  efficiencies  of  the  ISP  which  are 
subsequently  verified  by  other  means. 

The  methods  themself  may  measure  or  illuminate  the  same  property  of  the  the  ISP  from 
several  angles  and  these  results  corroborate  each  other. 

In  special  cases  the  approximate  measures  found  can  be  compared  against  direct 
measurements. 
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In  this  chapter  we  describe  some  simplifying  assumptions  which  were  made,  and  how  we 
chose  a subject  set  in  order  to  investigate  the  influence  of  the  above  sources  of  variation. 
As  the  presentation  of  each  method,  and  the  experimental  results  obtained  by  it,  is 
concluded,  we  also  discuss  the  results  in  view  of  this  validation  strategy.  Finally  these 
discussions  are  summarized  in  Section  8.2. 


3.1  Some  simplifying  assumptions 

To  make  a full  scale  investigation  of  the  effects  of  all  these  sources  of  variations  would  be  a 
major  programming  task.  Particularly  costly  is  tracing  on  several  ISPs,  and  selecting  the 
subject  programs  from  a wide  area  of  applications.  Firstly  we  would  need  an  interpreter 
program  for  each  of  the  ISPs  to  be  investigated.  Secondly,  we  would  have  to  change  the 
analysis  programs  to  reflect  the  other  ISPs*.  Thirdly,  in  selecting  subject  programs  we  would 
need  several  programs  from  each  major  area  of  application.  These  would  have  to  be  coded 
in  each  of  the  selected  languages  and  brought  to  run  on  each  of  the  selected  ISPs  before 
analysis  of  them  could  start.  The  analysis  would  entail  a large  expense  in  computer 
resources  and  the  result  would  bring  on  us  a data  reduction  problem  of  considerable 
magnitude.  In  addition  it  would  involve  locating  and  consulting  experts  in  each  application 
area. 

We  believe  that  we  have  legitimately  evaluated  our  mi  thods  without  going  to  this  large  scale 
investigation,  by  introducing  two  simplifying  assumptions: 

1)  We  restricted  ourselves  to  one  ISP,  viz.  the  PDP-10.  This  alleviated  the  first  two 
difficulties  above,  but  deprived  us  of  the  possibility  of  investigating  the  variation  due 
to  a change  of  ISP.  Almost  all  of  our  experimental  results  would  change  if  we 
performed  our  analyses  on  a different  ISP,  particuarly  the  results  for  register 
utilization,  details  of  instruction  sequences,  and  adoressing.  In  some  cases  the 


* There  is  an  obvious  advantage  of  running  the  analysis  programs  on  the  same  processor  as 
is  traced,  since  many  of  the  representations  have  obvious  and  efficient  formats.  Most  of  our 
programs  were  written  in  FORTRAN  to  ease  portability,  but  even  so  many  of  the 
representations  would  have  to  be  changed  when  tooling  for  another  ISP. 
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methods  would  have  to  be  modified,  or  new  methods  developed,  to  handle  special 
features  of  particular  ISPs*.  We  believe  this  to  be  of  little  importance  in  the  present 
context.  Our  goal  was  to  assess  the  ability  of  our  methods  to  detect  the  utilities  and 
costs  of  features  in  ISPs,  as  opposed  to  comparing  ISPs.  Since  our  methods  justified 
themselves  for  one  ISP  we  feel  confident  they  will  work  satisfactorily  for  most. 
Analogously,  if  we  were  developing  methods  to  determine  the  cost/utility  ratio  of 
programming  language  feat  ;.es  based  on  their  usage,  we  would  certainly  measure  the 
performance  of  programs  on  several  ISPs  but  we  might  well  restrict  ourselves  to  one 
language  provided  it  were  sufficiently  rich.  Further  justification  follows  from  the 
generality  of  the  PDP-10  as  discussed  on  page  9.  If  the  findings  of  our  validation  did 
not  have  a certain  generality  to  them  we  would  suspect  this  assumption  of  failing.  As 
it  is,  we  don’t. 

2)  We  restricted  ourselves  to  one,  albeit  rather  general,  area  of  application.  This 
reduced  the  set  of  subject  programs  to  manageable  proportions.  Again,  we  believe 
that  since  our  methods  showed  their  worth  in  evaluating  an  ISP  over  one  application 
area  then  they  can  be  applied  over  a spectrum  of  areas,  separately  or  in  union.  We 
would  expect  the  findings  to  differ  from  area  to  area  but  mostly  in  data  types  and 
data  operators.  This  is  probably  the  best  understood  part  of  the  domain  that  our 
methods  can  be  applied  to  and  hence  of  least  importance  to  us.  We  would  also 
expect  data  accessing  methods  to  be  influenced  by  the  application  and  our 
assumption  deprived’us  of  assessing  this  influence.  Considering  this  assumption,  we 
restricted  our  study  to  programs  mostly  from  the  area  of  technical  and  scientific 
computations,  oit  with  some  other  programs  included,  in  particular  compilers. 

We  summarize  this  discussion  as  follows:  The  intended  goal  of  our  methods  is  to  evaluate 
features  of  ISPs  as  suitable  for  a given  general  nr  specialized  application  area.  Our  main 
concern  in  validating  the  methods  was  to  assess  the  influence  of  factors  not  related  to  the 
ISP  or  to  the  area  of  application. 


* Consider  the  IBM  360  ISP  as  an  example,  and  compare  it  with  the  PDP-10.  Base  register 
addressing  would  imply  that  more  registers  would  be  used,  and  that  information  about 
addressing  would  become  more  important.  The  differences  in  instruction  sets  would  imply 
changes,  at  least  in  detail,  of  the  instruction  sequences.  Also  methods  for  investigation  of  the 
use  of  condition  codes  would  have  to  be  implemented. 


_ 
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3.2  Selection  of  data 

Again,  since  we  evaluated  the  methods,  and  not  any  particular  ISP,  we  were  not  worried  that 
our  selection  of  subject  programs  quantitatively  constituted  a fair  representation  of  any 
actual  workload.  Rather  we  wanted  to  see  all  programming  structures  that  occur  with  some 
minimal  frequency  in  real  world  programs  represented  in  our  test  sample.  To  estimate  the 
influence  of  the  various  sources  of  variation  we  studied  the  behaviour  of  several  versions  of 
the  same  several  algorithms,  programmed  by  different  programmers,  in  different  languages 
and,  if  possible,  compiled  by  different  compilers  for  the  same  language. 


3.2.1  Language  selection 

To  study  the  language  variation,  we  selected  four  available  languages  suited  to  the  chosen 
application  area,  namely:  FORTRAN,  ALGOL,  BASlCf  and  BLISS.  These  languages  cover  a 
range  of  age,  degree  of  security,  inherent  efficiency  and  structure: 

FORTRAN  [IBM56],  [USAS66]  was  designed  about  1954  but  has  since  been  modified  and 
extended  considerably.  ALGOL  [NauP63]  was  designed  in  1957-60,  BASIC  [KemJ61]  in  the 
early  sixties  [KemJ61],  BLISS  [WulW70]  was  designed  around  1969. 

In  terms  of  control  structures,  including  program  factorization  mechanisms,  all  the  chosen 
languages  have  looping  and  conditional  constructs.  BASIC  is  the  poorest,  having  subroutines 
but  no  local  names.  FORTRAN  has  more  structure,  particularly  subroutines  and  localized  data. 
ALGOL  has  even  more,  notably  the  compound  statement  with  its  consequences  for  the  other 
control  structures,  block  structure,  and  an  advanced  parameter  mechanism.  BLISS  is 
comparable  to  ALGOL,  with  a simpler  parameter  mechanism,  but  it  has  coroutines,  and  intra 
routine  control  structures  so  rich  that  a general  GO  TO  has  been  omitted.  This  contributes 
towards  better  structured  programs. 

For  data  structures,  FORTRAN,  BASIC  and  ALGOL  all  have  vectors  and  multidimensional 
arrays,  BLISS  has  any  data  structure  which  the  programmer  cares  to  define. 


f To  obtain  a fair  comparison  of  the  language  structures  involved,  we  did  not  use  the  matrix 
operators  of  BASIC  where  they  would  normally  be  called  for. 
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BASIC  has  only  one  type1,  floating  point,  converting  to  integer  indexes  automatically  as 
needed.  ALGOL  and  FORTRAN  have  several  arithmetic  types  with  automatic  type  conversion, 
and  also  a Boolean  type.  BLISS  has  no  types  but  relies  on  the  written  operator  to  determine 
the  correct  operation. 

FORTRAN  and  BLISS  have  almost  no  run  time  checking,  BASIC  checks  array  bounds,  ALGOL 
does  this  and  also  has  extensive  checking  of  parameters  including  type  conversion. 

BLISS  generates  the  most  efficient  object  programs,  largely  due  to  a highly  optimizing 
compiler.  FORTRAN  programs  are  efficient,  ALGOL  programs  are  less  efficient  due  to  the  high 
degree  of  security  and  to  the  precise  definition  of  evaluation  order  in  the  context  of  possible 
side  effects.  BASIC  programs  are  inefficient  due  to  a particularly  fast  and  dirty  compiler. 

It  follows  that  our  languages  span  most  of  the  variations  found  within  commonly  used 
languages  for  scientific  and  technical  calculations. 


3.2.2  The  subject  set 

For  our  subject  programs  we  first  selected  six  algorithms  from  the  Collected  Algorithms 
from  the  Communications  of  the  ACM",  (CALGO).  The  selection  was  made  in  such  a way  that 
it  included  as  many  as  possible  of  the  common  data  types,  data  structures,  control  structures 
and  parameter  forms  found  in  higher  level  languages.  We  also  attempted  to  cover  as  wide  a 
range  as  feasible  of  the  modified  SHARE  classification,  used  by  CALGO  to  classify  the 
algorithms.  Other  criteria  used  in  the  selection  were: 

The  algorithm  must  have  a reasonable  size,  - large  enough  to  contain  the  interesting 
features  in  context,  but  small  enough  to  be  ceded  in  all  four  languages,  traced  and 
analyzed  in  a reasonable  time. 

The  remarks  and  certifications  in  the  CALGO  collection  should  not  indicate  that  trouble 
might  be  expected  using  the  algorithm. 

The  subject  matter  of  the  algorithm  should  be  sufficiently  known  to  this  author  that  he 
could  detect  obvious  errors  in  the  published  algorithm  and  in  his  various  versions  of  it. 


♦ Excluding  the  string  type  which  we  don’t  use. 
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Writing  a main  program  for  the  algorithm  should  be  straight  forward. 

The  CALGO  algorithms  selected  are  briefly  described  in  Figure  3-1,  along  with  the  rest  of 
the  subject  set.  This  set  of  algorithms  gives  us  a good  indication  of  the  variations  due  to 
algorithm  and  language.  Listings  of  all  the  ALGOL  versions,  all  4 versions  of  PERT,  and  all  5 
versions  of  Aitken,  are  reproduced  in  Appendix  E. 

The  language  structures  searched  for,  showing  how  they  occur  in  the  selected  algorithms, 
are  tabulated  in  Figure  3-2.  The  statement  count  given  is  the  approximate  number  of 
ALGOL  statements’  in  the  published  version,  included  as  a measure  of  the  coding  effort.  As 
is  seen  from  the  table,  several  of  the  desired  structures  are  not  represented.  Double 
precision  arithmetic  is  only  present  in  one  algorithm,  Crout,  very  locally  in  space  (though  not 
in  time),  and  only  in  the  ALGOL  and  FORTRAN  versions  since  BLISS  and  BASIC  do  not  support 
this  type.  Complex  arithmetic  is  only  marginally  present,  since  Bairstows  method  finds 
complex  roots  but  does  no  calculations  using  them  and  no  variables  are  declared  of  this  type. 
Bit  manipulation,  bit  vectors  and  characters  are  not  used  by  any  of  these  algorithms.  Note 
also  that  real  arithmetic  in  treesort  is  present  only  to  the  extent  in  which  it  is  needed  for 
comparisons  of  magnitude,  or  for  initialization. 

Only  Crout’s  method  uses  two  dimensional  arrays  and  we  found  no  suitable  algorithm  using 
arrays  of  3 or  more  dimensions”,  and  no  triangular  or  ragged  arrays.  We  also  found  no 
suitable  algorithms  using  record  structures  or  lists,  although  Treesort  uses  linked  structures. 

We  found  a rich  selection  of  GO  TOs’”,  conditionals  and  loops,  and  one  instance  of  a CASE 
statement  (switch,  computed  GO  TO).  Since  only  BLISS  and  ALGOL  support  recursion,  and  this 
feature  is  little  used  in  published  algorithms,  we  did  not  include  it.  For  the  same  reason  we 
included  no  algorithm  using  label  parameters.  Other  parameter  forms  are  well  represented. 
In  particular,  Ising  passes  procedure  names  as  parameters.  For  this  reason  Ising  could  not 
be  coded  in  BASIC. 

’ Not  counting  <blOck>s  and  <compound  statements.  Thus  "IF  B THEN  BEGIN  A:=X+1;  I:**I-1 
END  ELSE  A:=X-1;“  counts  as  4 statements. 

” Knuth  [KnuD70]  reports  that  1.47  of  the  static  variable  occurrences  in  his  FORTRAN  sample 
has  3 or  4 indices  or  parameters.  He  does  not  distinguish  function  calls  from  array  accesses. 
Assuming  functions  of  many  parameters  to  be  more  common  than  arrays  of  many  dimensions, 
this  supports  our  findings. 

hi  Most  of  the  GO  TOs  caused  little  problem  when  translating  into  BLISS,  an  exception  was 
the  Bairstow  program  which  required  artificial  loops,  compounds  and  a function. 
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CALGO  no.  30 
Bairstow 


CALGO  no.  43 
Crout 


CALGO  no.  113 
Treesort 


CALGO  no.  119 
PERT 


CALGO  no.  257 
HSvie 


CALGO  no.  355 
Ising 


FIGURE  3-1 

Description  of  the  subject  set. 


Bairstow/Newton  method  for  polynomial  roots. 

Author:  K.  W.  Ellenberger.  Corrections  by  W.  J.  Alexander,  K.  J.  Cohen  and 
J.  J.  Kohfeld. 

Modified  SHARE  category  C2:  Zeroes  of  polynomials. 

Data:  Initialization  by  explicit  assignments. 

This  is  a classical  algorithm  for  the  problem. 

Crout’s  method  for  linear  equations  with  pivoting. 

Author:  H.  C.  Thacher.  Corrections  by  C.  Domingo  and  F.  Roderiguez-Gil. 
Modified  SHARE  category  F4:  Linear  equations. 

Data:  Matrix  values  computed  by  simple  expressions.  Logarithm  used  for 
right  hand  sides. 

A classical  algorithm  for  the  problem. 

Treesort. 

Author:  R.  W.  Floyd. 

Modified  SHARE  category  Ml:  Sorting. 

Data:  Initialization  by  simple  expression.  Initial  order  is  inverse  of  desired. 
A logarithmic  sorting  algorithm. 

Evaluation  of  a PERT  network. 

Authors:  B.  Eisenman  and  M.  Shapiro.  Corrections  by  L.  S.  Coles. 

Modified  SHARE  category  H:  Operations  research,  graphs. 

Data:  Initialization  by  explicit  assignments. 

A somewhat  speeded  up  algorithm  for  this  problem. 

Numerical  integration  by  Havies  method. 

Author:  R.  N.  Kubick. 

Modified  SHARE  category  Dl:  Quadrature. 

Data:  Integrands  are  simple  expressions  involving  square  root  or 
exponential. 

A modified  Romberg  integration. 

An  algorithm  for  generating  Ising  configurations. 

Author:  J.  M.  S.  Simoes  Pereira. 

Modified  SHARE  category  Z:  Al'  others. 

Data:  Maximal  n read  from  teletype;  n,  x and  t varied  by  loops  over  all 
significantly  different  combinations. 

An  (x,t)  Ising  configuration  is  a sequence  (Si,...,Sn)  of  zeroes  and  ones  such 
that: 

I S,  ■ x and  I iSj.i  - Sj|  - t 

The  problem  is  of  interest  in  theoretical  physics. 
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This  algorithm  was  included  mainly  because  routine  calls  is  its  most 
important  control  structure.  Since  routine  names  are  passed  as  parameters 
it  could  not  be  coded  in  BASIC. 


Aitken  N-point  polynomial  interpolation. 

Authors:  M.  R.  Barbacci,  L.  E.  Flon,  G.  N.  J.  Rolf,  W.  A.  Wuif  and  A.  Lunde. 
(Each  contributed  one  version  of  the  algorithm.  The  s'o.vest  version  was 
omitted.  The  fastest  (and  shortest)  version  was  further  improved  by  about 
107.  in  speed  and  size,  and  included.  Hence  five  versions  of  this  algorithm 
were  used.) 

Modified  SHARE  category  El:  Interpolation. 

Source  language:  BLISS. 

Data:  Natural  logarithm  tabulated  at  irregular  intervals  by  loop. 

Standard  polynomial  interpolation. 

SEC  Zeroes  of  simultaneous  nonlinear  equations  by  secant  method. 

Author:  G.  W.  Stewart. 

Modified  SHARE  category  C5:  Zeroes  of  trancedental  functions. 

Source  language:  FORTRAN 

Data:  Functions  are  linear  combinations  of  linear  and  quadratic  terms  in  the 
variables,  parameters  read  from  teletype. 

The  program  was  designed  for  research  in  the  problem  area  and  method. 

FORFOR  Compiler  for  FORTRAN. 

Source  language:  Assembler. 

Data:  FORTRAN  version  of  the  Treesort  algorithm. 

A compiler  of  the  Digitek  design,  simulating  a one-accumulator  processor. 

FORTEN  Compiler  for  FORTRAN. 

Source  language:  BLISS. 

Data:  FORTRAN  version  of  the  Treesort  algorithm. 

A compiler  doing  flow  analysis  and  generating  efficient  code. 

ALGOL  Compiler  for  ALGOL. 

Source  language:  Assembler,  structured  control  by  macros. 

Data:  ALGOL  version  of  the  Treesort  algorithm. 

A fast  ALGOL  compiler  generating  efficient  code  (for  ALGOL).  Language 
slightly  extended. 

BASIC  Compile  and  link  phases  of  the  BASIC  system. 

Source  language:  Assembler. 

Data:  BASIC  version  of  the  Treesort  algorithm. 

A fast  compiler  generating  extremely  inefficient  code. 

BLISS  Compiler  for  BLISS 

Source  language:  BLISS. 

Data:  BLISS  version  of  the  Treesort  algorithm. 

A slow  compiler  generating  efficient  and  small  code. 
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FIGURE  3-2 

Language  properties  of  the  small  subject  algorithms: 
x means  property  present  in  algorithm. 

- means  property  marginally  present  in  algorithm. 
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A related  source  of  variation  is  that  of  language  implementation.  Luckily  the  PDP-10  has  two 
FORTRAN  systems,  FORTRAN  40  and  FORTRAN  TEN,  here  denoted  FORFOR  and  FORTEN  or 
simply  FOR  and  TEN.  Hence  we  had  an  obvious  way  of  assessing  this  variation.  We  analyzed 
all  the  CALGO  algorithms  plus  SEC  (see  below)  using  both  of  the  FORTRAN  systems.  Due  to  a 
suspected  bug  in  TEN,  we  did  not  use  the  optimize  option  of  TEN  when  compiling  our 
programs.  The  various  versions  of  these  algorithms  will  be  denoted  ALGOL  Ising,  BASIC 
Crout  etc. 

To  estimate  the  variations  due  to  programmer  habits  we  included  5 versions  of  an  algorithm 
as  coded  in  BLISS  by  4 experienced  programmers.  The  algorithm  was  polynomial 
interpolation*  which  nicely  completed  our  coverage  of  the  modified  SHARE  categories.  BLISS 
w?,$  chosen  since  it  gives  the  programmer  more  alternative  forms  of  expression  than  do  the 
other  languages.  This  was  thought  to  be  of  importance  considering  the  small  algorithm. 
These  five  programs  are  denoted  by  the  letters  L,  G,  B,  A and  E (efficient). 

For  each  of  these  algorithms  a main  program  was  written,  to  provide  data  for  the  algorithm 
and  present  the  results.  To  initialize  the  data  for  the  algorithms  we  used  explicit 
assignments  of  either  constants  or  calculated  values,  usually  simple  expressions  involving  the 
indices  of  the  variables  to  be  initialized.  A short  indication  of  the  method  used  in  each  case 
is  given  with  the  description  of  the  algorithm  in  Figure  3-1. 

After  a few  trial  traces  it  became  obvious  that  input  and  output  accounted  for  a large 
fraction  of  the  total  activity.  Not  only  did  format  interpretation  take  much  time,  but  also 
channel  and  file  initialization  and  status  checking.  We  therefore  decided  to  leave  I/O  out  of 
the  traced  part  of  the  algorithms,  with  a few  exceptions:  one  parameter  to  the  Ising  program 
is  read  from  the  teletype,  and  a minimal  output  was  included  in  some  cases. 

Our  sample  so  far  had  one  major  deficiency:  all  the  programs  traced  were  small.  To  rectify 
this  we  traced  all  the  compilers  involved,  that  is  the  ALGOL  and  BLISS  compilers,  the  compile 
and  link  phases  of  the  BASIC  system  and  the  two  FORTRAN  compilers.  All  these  traces  were 
made  while  compiling  the  appropriate  version  of  the  Treesort  algorithm.  An  additional 
benefit  from  this  was  that  we  got  examples  of  many  of  the  structures  our  CALGO  sample  did 
not  have,  including  bit  manipulation,  bit  vectors,  character  handling,  records,  lists  and 


* By  Aitkens  method  as  described  in  Milne  [MilW49]. 
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recursion.  We  also  believe  that  compilers  account  for  a large  fraction  of  the  resources  used 
in  any  installation  and  hence  are  of  particular  importance  as  constituents  of  sets  of  typical 
programs. 

We  further  included  one  somewhat  larger  program  from  the  technical  scientific  calculations 
area,  this  was  a program,  SEC,  to  solve  nonlinear  simultaneous  equations.  This  program  was 
analyzed  using  both  versions  of  FORTRAN. 

The  resulting  subject  set  consists  of  the  6 CALGO  algorithms  written  in  each  of  the  A 
languages,  the  Aitken  algorithm  written  in  BLISS  by  A programmers,  5 compilers  and  the 
large  scientific  numerical  program.  These  programs  are  well  distributed  over  the  area 
spanne  by  the  modified  SHARE  classification.  The  following  general  categories  are 
represented: 

B (Standard  functions)  by  the  integrands  for  Havie. 

C (Polynomials,  zeroes)  by  Bairstow  and  SEC. 

D (Integrals  and  differential  equations)  by  Havie 
E (Polynomial  approximation)  by  Aitken. 

F (Matrix  operations)  by  Crout. 

G (Statistics,  permutations,  subset  generation)  by  Ising  (related). 

H (Operations  research,  graphs)  by  PERT. 

L (Compiling)  by  the  compilers. 

M (Sorting,  data  conversion)  by  Treesort. 

Z (Others)  by  Ising. 

The  FORTRAN  versions  of  the  6 CALGO  algorithms,  and  also  the  large  scientific  program,  were 

analyzed  as  compiled  using  the  two  different  FORTRAN  compilers.  Thus,  since  the  BASIC 

version  of  Ising  was  excluded,  the  sample  altogether  consisted  of  41  traces.  The  traces  vary 

in  size  from  19000  to  almost  600000  executed  instructions.  Altogether  about  5.3  million 

0 - 
instructions  were  traced,  corresponding  to  almost  16.8  seconds  of  CPU  time  ( computed  time) 

on  the  KA10.  This  should  give  a good  basis  on  which  to  evaluate  the  methods.  The 

computed  time  and  instruction  count  of  the  subject  set  are  tabulated  in  Figure  3-3.  The 

average  instruction  execuUon  rate  for  each  program  is  tabulated  in  Figure  3-4. 
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FIGURE  3-3 

Time  cost  of  the  subject  set. 
Computed  time  in  seconds. 
Instruction  count  in  lOOOs. 


Source  language: 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Bairstow 

0.12 

0.45 

0.09 

0.08 

0.08 

Crout 

36 

156 

23. 

21 

19 

0.32 

0.49 

0.25 

0.43 

0.23 

115 

163 

62 

109 

63 

Treesort 

0.47 

0.55 

0.26 

0.27 

0.35 

140 

187 

106 

111 

97 

PERT 

0.16 

0.41 

0.07 

0.08 

0.07 

63 

157 

26 

32 

27 

Havie 

0.48 

0.33 

0.12 

0.18 

0.17 

168 

103 

28 

38 

36 

Ising 

0.22 

- 

0.07 

0.05 

0.05 

91 

- 

25 

20 

20 

SEC 

- 

- 

- 

2.08 

1.94 

” 

“ 

541 

497 

Algorithm\Programmer 

E 

B 

A 

G 

L 

Ait  ken 

0.18 

0.19 

0.21 

0.41 

0.44 

44 

47 

60 

143 

139 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Assembler  written 

0.19 

0.25 

- 

1.56 

_ 

compilers 

74 

85 

- 

591 

- 

BLISS  written 

_ 

1.67 

. 

0.78 

compilers 

- 

“ 

593 

- 

295 

BLISS  versions  would  have  been  faster  if  OWN  vectors  and  matrices  had  been  used  instead 
of  LOCAL  and  parameter. 


WARNING:  The  format  of  this  table  is  slightly  different  from  the  standard  table  format  of  the 
later  chapters,  first  used  in  Figure  3-4. 
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FIGURE  3-4 

Instruction  execution  rate  of  the  subject  set 
in  units  of  1000  instructions  per  second  (Kips) 


Algorithm\language 

Bairstow 

Crout 

Treesort 

PERT 

HSvie 

Ising 

Secant 

Algorithm\Programmer 

Aitken 

Source  progr.\Compiler 
Treesort 


ALGOL 

300 

362 

300 

394 

351 

410 


ALGOL 

382 


BASIC 

345 

330 

339 

380 

308 


BASIC 

343 


BLISS  FORFOR  FORTEN 
261  247  243 

249  256  277 

401  412  275 

397  395  402 

230  210  219 

379  391  417 

260  256 


BLISS  FORFOR  FORTEN 
354  379  379 


Max:  410,  Min:  210,  Average:  324,  Standard  dev.:  63. 


3.2.3  Subsets  of  the  subject  set 

In  some  cases  it  is  desirable  to  study  the  experimental  results  from  3 su^ect  se‘ 
representing  a subarea  of  the  area  of  application.  Our  subject  set  falls  naturally  into 

such  subsets: 

a)  The  compilers. 

b)  The  numeric  set  consisting  of  SEC,  Bairstow,  Crout,  HSvie  and  Aitken. 

c)  The  nonnumeric  set,  consisting  of  Treesort,  PERT  and  Ising. 

This  subdivision  is  used  in  Section  5.1. 
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CHAPTER  4 


REGISTER  STRUCTURE 


We  will  now  discuss  the  motivation  for,  and  costs  associated  with  general  register  designs. 
The  main  problems  we  attack  are: 

a)  What  is  the  optimal  number  of  registers?  This  is  the  most  important  issue  in 
connection  with  register  structure.  All  the  costs  discussed  below  depend  heavily  on 
this  number. 

b)  How  desirable  is  generality?  This  can  be  an  issue  in  some  cases,  particularly  for 
designs  with  a short  instruction  word. 

We  do  not  pretend  to  solve  these  problems,  only  to  present  methods  for  elucidating  them. 

The  central  concept  in  our  methods  is  that  of  a register  life..  We  present  an  algorithm  for 
detecting  such  lives,  a method  of  classifying  them  according  to  the  types  of  the  events 
constituting  them,  an  algorithm  to  detect  simultaneous  lives,  and  finally  methods  to  estimate 
the  cost  of  simulating  parallel  register  activity  in  fewer  registers  than  were  used  by  the 
original  subject  program  as  traced.  The  data  obtained  by  these  methods  are  highly  relevant 
to  the  problems  of  register  block  size  and  generality.  The  first  few  subsections  discuss 
register  structures  in  general,  terminology,  and  other  top  es  common  to  the  methods. 


4.1  The  basic  tradeoffs 


In  old  ISP  designs,  the  arithmetic  registers  that  the  programmer  had  access  to  were  the 
actual  input  registers  to  the  arithmetic  unit.  A typical  design  would  have  an  accumulator  (A 
register),  and  an  extension  of  it  (Q  register)  to  hold  double  length  products  and  dividends, 
quotients,  multipliers,  and  the  like.  The  second  operand  for  arithmetic  would  come  from 
primary  memory.  Further  there  would  be  a number  of  index  registers  which  would  have  a 
restricted  set  of  arithmetic  and  testing  operations.  From  a slightly  different  viewpoint  one 
might  say  that  the  registers  were  divided  into  groups  according  to  criteria  such  as: 
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Floating  point  capability 

Full  fixed  point  capability 

Simple  fixed  point  capabilities  and  indexing 

Temporary  storage  only 

etc. 

The  "simple  fixed  point"  group  could  be  those  having  addition  and  subtraction  only,  possibly 
further  restricted  to  immediate  operands  only. 

As  electronic  circuitry  became  cheaper  and  faste,  compared  to  primary  memory  it  became 
feasible  and  common  to  have  a small  electronic  memory  in  the  central  processor  for  locally 
important  operands.  Operands,  as  specified  by  an  extra  address  in  the  instructions,  are 
transferred  through  a switch  from  these  memory  ceils  to  the  arithmetic  input  registers, 
whereas  the  latter  registers  are  invisible  to  the  programmer.  One  or  both  of  the  operands 
may  come  from  this  memory,  the  alternative  being  primary  memory  as  before.  As  a natural 
extension,  this  memory  contains  not  only  the  arithmetic  operands  but  also  the  indexes, 
control  information  etc.  The  terms  resisted,  register  t and  in  particular  general 
registers,  are  now  used  to  mean  this  local  memory. 

The  general  registers  commonly  serve  a combination  of  several  functions: 

Arithmetic  registers 
Index  registers 

Base  registers  (double  indexing) 

Subroutine  linkage 

Program  flag  registers  (for  Booleans) 

Stack  pointers 
Address  pointers  (to  data) 

Temporary  data  storage 

Temporary  program  storage  (for  small  loops) 

Program  counter  (PC) 
etc. 

Few,  if  any,  computers  have  registers  with  all  these  properties.  In  particular,  few  machines 
have  the  PC  in  a general  register  (exception:  the  PDP-11),  and  few  may  execute  programs 
from  them  (exception:  the  PDP-10).  The  register  block  may  be  part  of  the  memory  address 
space  for  all  functions  (as  in  the  PDP-10),  just  for  some  (as  in  the  UN1VAC  1107),  or  not  at  all 

(as  in  the  IBM  360). 

We  will  devote  this  section  mainly  to  registers  for  data  manipulation.  Indexing  and 
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indnsction  will  be  discussed,  however,  to  the  extent  that  they  are  operations  involving 
registers. 

Assuming  that  indices,  if  they  exist  at  all,  are  always  held  in  "registers"  addressable  by  short 
addresses  in  the  instruction  word,  we  may  list  several  factors  that  motivate  the  transition  to 
a general  register  design: 

To  save  addressing  space  in  the  Instruction  word  compared  to  two  address  designs.  This 
is  not  discussed  further  in  the  thesis. 

To  save  code  space  and  instruction  excutions  compared  to  single  accumulator  designs.  To 
estimate  this  factor  is  outside  the  scope  of  the  thesis. 

To  have  a fast  store  for  locally  important  operands.  This  is  further  discussed  in  Section 
4.6 


To  have  a full  complement  of  operators  for  indices  and  control  information  as  well  as  for 
normal  arithmetic  operands.  We  discuss  this  in  Section  4.5. 

To  clean  up  the  IS^  tecture  and  central  processor  design.  This  is  again  motivateo 
by  programming  t d hardware  considerations,  to  estimate  its  cost  ano  utility  is 
outside  the  scope  of  this  thesis. 

The  costs  of  general  registers  are  contributed  by: 

Space  cost  of  lengthened  instruction  words  compared  to  one  address  design.  This 
question  is  not  addressed  in  the  thesis. 

Time  cost  of  load  and  store  instructions  compared  to  a full  two  address  design.  Some  of 
the  results  of  Chapter  5 may  bear  on  this  factor. 

Time  cost  of  saving  and  restoring  registers.  This  can  be  reduced  by  having  special 
"process  swap"  or  "register  save/restore"  instructions,  or  by  having  separate  blocks  of 
registers  for  each  program  or  for  groups  of  programs,  commonly  defined  by  the  interrupt 
structure.  Hence  this  cost  may  or  may  'iOt  apply  on  interrupts.  The  cost  certainly 


■ 


. 
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applies  on  subprogram  calls,  particularly  if  subprograms  are  separately  compiled*.  Again 
some  of  the  results  from  Chapter  5 apply. 

Time  cost  of  register  access  switch.  This  time  is  small  compared  to  the  time  gained  by 
not  accessing  primary  memory,  but  may  increase  somewhat  with  the  number  of  registers. 
It  may  be  estimated  from  the  results  in  this  chapter. 

Hardware  cost  of  the  registers  and  the  switch.  To  estimate  this  is  outside  the  scope  of 
the  thesis. 

The  relative  importance  of  these  factors  depends  on  the  state  of  technology.  In  particular 
the  current  trends  towards  cache  memories,  and  towards  larger,  faster  and  cheaper 
electronic  memories,  tend  to  make  the  fast  local  store  argument  less  important.  To  make 
valid  design  decisions  when  faced  with  cost  effectiveness  requirements,  it  is  necessary  first 
to  establish  quantitatively  their  relative  importance  in  a technology  independent  way. 


4.2  Some  definitions 

The  intent  of  these  definitions  is  to  make  precise  the  term  "register  life",  and  to  define  some 
important  properties  of  register  lives. 


♦ Our  analysis  of  the  trace  of  the  BLISS  compiler  indicates  that  a "declarable  register"  is 
restored  more  than  5000  times  every  second  due  to  subroutine  calling;  the  same  number  as 
by  restoring  16  registers  312  times.  A complete  process  swap  would  thus  have  to  be 
performed  over  300  times  per  second  in  order  for  the  time  cost  of  register  saving  due  to 
process  swaps  to  exceed  that  due  to  subroutine  calling.  We  believe  this  is  a high  frequency 
of  process  swaps  for  the  PDP-10  (KA10),  but  not  extremely  high.  Including  the  "F-register”, 
the  count  for  BLISS  rises  to  16500  registers  per  second,  corresponding  to  about  1000 
process  swaps  per  second.  (This  is  about  1.15  registers  saved  per  routine  call).  The 
"temporary  registers"  are  not  included  at  all  in  these  counts,  Measurements  performed  on  the 
IBM  360/91  indicate  about  470  SVCs  and  1/0  interrupts  per  second.  Assuming  the  360/91  to 
be  ten  times  as  fast  as  the  KA10,  this  corresponds  to  about  50  process  swaps  per  second  on 
the  KA10.  All  this  indicates  that  register  saving  because  of  routine  calls  is  significantly  more 
costly  than  register  saving  due  to  process  swaps. 
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A register  is  loaded  when  a new  value  is  brought  into  it  that  is  unrelated  to  its  previous 
value  (except  for  possible  use  of  the  old  value  in  the  address  calculation). 

A register  is  modified  when  a new  value  is  brought  into  it  which  is  the  result  of  an 
operation  involving  the  old  value  as  one  of  its  operands. 

A register  is  used  when  it  is  loaded,  modified,  employed  in  address  calculation,  used  as  an 
operand,  stored,  tested  or  otherwise  referenced  from  an  instruction. 

A register  is  read  when  it  is  used  but  not  modified  or  loaded. 

Since  our  finest  grain  of  time  is  that  of  one  instruction,  a register  may  be  loaded  and 
otherwise  used  at  the  same  time.  In  a finer  time  scale  this  would  not  be  so.  Hence  we 
regard  the  sets  of  loadings,  modifications  and  readings  of  a register  as  disjoint.  Their  union 
is  the  set  of  all  usages  of  that  register.  Two  other  subsets  are  often  needed: 


A register  is  changed  when  it  is  modified  or  loaded,  it  is  accessed  when  it  is  read  or 
modified. 


A register  life  (R-life)  lor  a given  register  is  the  span  of  time  starting  when  the  register 
is  loaded  and  ending  with  the  last  access  before  the  next  time  it  is  loaded*.  If  a register 
is  used  in  the  address  calculation  of  a load  to  itself,  this  use  is  regarded  as  an  access  in 
the  life  prior  to  the  loading. 

Typically  a register  life  starts  w'th  a LOAD;  operations  like  ADD,  STORE,  SHIFT  etc.  may 
reference  the  register  and  possibly  modify  it  during  its  life,  it  may  be  used  as  a stackpointer, 
indirect  address  etc. 

The  initial  loading  usage  in  a register  life  is  called  its  first  use,  the  term  last  use  has  an 
equally  obvious  definition.  The  first  and  last  uses  of  an  R-life  constitute  its  transitions. 
The  length  of  an  R-life  is  the  time  from  its  first  use  to  its  last  use,  both  endpoints 
included. 


1 


* An  R-life  should  be  thought  of  as  closely  related  to  its  register.  Formally  this  could  be 
incorporated  into  the  definition  by  defining  an  R-life  to  be  a triple:  <Register  name,  time  of 
load,  time  of  last  use>. 
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A register  is  U^e.  during  an  R-life  for  that  register.  It  is  daad.  when  it  is  not  live.  It  is 
dormant  when  it  is  live  but  has  not  been  used  for  some  long  period  of  time  specified  in 
each  actual  case. 

We  emphasize  that  we  are  observing  the  dynamic  behaviour  of  programs,  hence  the 
observed  R-lives  are  in  general  different  from  those  that  we  would  observe  by  a static  study 
of  the  code  between  the  instructions  responsible  for  the  first  and  last  uses,  and  the  usages 
of  a register  during  its  life  may  involve  instructions  from  quite  remote  parts  of  the  code. 

The  following  definitions  are  introduced  in  order  that  we  may  classify  R-lives  according  to 
the  kinds  of  operations  they  have  been  used  for.  This  will  be  used  to  assess  the  need  for 
generality  of  registers. 

A register  usage  classification  is  a set  of  possible  modes  or  attributes,  each  describing  a 
different  way  in  which  a register  may  be  used  by  an  instruction. 

A simple  classification  could  be:  {<loaded>,  <stored>,  <used  for  integer  arithmetic^  <used  for 
real  arithmetic^  <us9d  otherwise>}.  A more  complete  classification  is  presented  in  Section 
4.3. 


A register  usage  attribute  is  a member  of  a register  usage  classification.  The  above 
classification  has  5 attributes:  <loaded>,  <stored>,  etc. 

i 

A register  usage  class  is  a set  of  register  usage  attributes,  i.e.  a subset  of  the  register 
usage  classification. 

When  no  confusion  can  arise,  the  word  "register"  is  usually  omitted  from  the  above  3 terms. 

Each  R-life  has  a usage  class  associated  with  it,  which  is  uniquely  defined  by  the  (unordered) 
set  of  usages  of  the  register  during  its  life.  We  will  usually  use  the  term  to  denote  a class 
defined  in  this  way. 

A register  usage  classification  is  in  a sense  a generalization  of  the  set  of  instructions  and 
other  basic  operations  of  the  processor  which  involve  the  registers.  It  may  also  be  thought 
of  as  a classification  of  the  instructions  of  the  ISP  in  terms  of  how  they  use  registers.  Given 
an  opcode  and  a field  of  the  instruction  word  which  may  specify  a register,  a usage  attribute 
is  true  or  false  depending  on  whether  that  instruction  uses  the  register  specified  by  that 
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field  in  that  particular  mode.  This  is  in  fact  the  way  it  is  represented  in  our  analysis 
program. 


4.3  A register  usage  classification 

In  Figure  4-1  and  Appendix  C we  present  a register  usa/r<*  classification  for  the  PDP-10. 
It  is  designed  to  detect  the  loading,  modification  and  reading  of  registers,  *s  well  as  the 
various  forms  of  reading  or  modification.  Tnis  classification  was  used  in  our  analysis 
programs  to  detect  and  classify  R-lives.  Although  it  is  designed  for  a particular  ISP,  few  and 
obvious  modifications  would  be  necessary  to  use  it  for  any  other  register  oriented  ISP*. 

This  classification  grew  and  generalized  as  we  were  working  with  it.  Our  experience  is  that 
the  classification  given  in  Figure  4-1  is  satisfactory.  It  contains  three  minor  improvements 
over  the  one  we  actually  used  for  our  analyses.  The  "Used  as  operand"  and  "Immediate 
fixpoint  add  or  subtract"  attributes  were  included  post  hoc.  Also,  our  analysis  program  did 
not  check  for  instruction  fetches  from  registers,  only  for  jumps  into  registers  or  XCT** 
instructions  addressing  registers.  The  errors  caused  by  this  omission  are  considered 
insignificant.  , 

For  technical  reasons  the  machine  representation  of  the  register  usage  attributes  separate 
them  into  two  kinds,  reference  attributes  and  access  attributes.  Reference  attributes  are 
used  to  define  the  three  major  types  of  reference,  i.e.  loading,  modification  or  reading. 
They  are  used  by  the  analysis  programs  as  case  selectors,  and  hence  represented  as 
consecutive  values.  The  access  attributes  are  used  to  accumulate  the  types  of  usage  of  a 
regisier  during  its  R-life.  They  are  represented  as  bit  positions  in  a field,  so  that  they  may 
be  easily  included  into  a register  usage  class  by  OR-ing. 

Since  tnere  are  3 fields  in  each  instruction  word  of  the  PDP-10  which  may  reference  a 
register,  the  actual  description  of  each  instruction  consists  of  3 sets  of  attributes,  each 
corresponding  to  one  of  these  fields  and  the  different  ways  it  may  use  a register.  Further 
complication  follows  from  the  existence  of  instructions  which  reference  two  registers  by  the 
"ACC"  field,  from  the  special  treatment  of  register  0 by  many  instructions,  and  from  the 


* For  example,  if  analyzing  the  PDP-11,  autoincrement  might  be  introduced  as  an  attribute. 

♦♦  Execute  contents  of  effective  address 
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A register  usage  classification. 


Reference  attributes: 

Not  used 

Loaded 

Modified 

Used  but  not  modified 

Undefined  (Monitor  communication  etc.) 

Access  attributes: 

Indexing  data  accesses 

Indexing  jumps  or  executes 

Indexing  immediate  operands 

Immediate  fixpoint  add  or  subtract 

Fixpoint  add  or  subtract  w.  memory  operand 

Fixpoint  multiply  or  divide 

Floating  point  arithmetic 

Halfword  modified 

Byte  loaded  or  stored 

Modified  by  logical  operation 

Modified  by  shift 

Used  as  stackpointer 

Used  to  hold  an  address  (As  in  Block  transfers  etc.) 
Tested 

Used  for  monitor  parameter 
Used  as  byte  pointer 
Used  as  indirect  address 
Used  as  an  operand 
Stored 

Executed  (XCT’ed*  or  fetched  as  an  instruction) 


result  to  memory  mode  of  many  PDP-10  instructions.  These  complications  affect  the 
reference  attributes,  hence  corresponding  code  has  to  be  built  into  the  analysis  program.  In 
Figure  4-1  we  described  the  classification  as  independent  of  these  complicating  matters.  The 
full  classification,  as  we  used  it,  is  reproduced  in  Appendix  C. 


f I.e.  referenced  by  an  execute  instruction 
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4.4  Register  life  detection 

In  order  to  say  anything  beyond  trivialities  about  register  usage,  it  is  necessary  to  detect 
the  register  lives.  The  following  simple  algorithm  will  do  this  in  one  scar,  over  the  trace.  A 
register  usage  classification  is  needed  which  includes  at  least  the  attributes  "loaded"  and 
"accessed".  As  the  trace  is  read,  the  algorithm  keeps  for  each  register  the  times  of  its  most 
recent  load  and  use.  For  each  instruction  in  the  trace,  all  fields  that  can  possibly  reference  a 
register  have  to  be  examined  with  this  in  mind.  Whenever  the  register  is  loaded  anew,  or  at 
the  r,nd  of  analysis,  the  transitions  of  its  most  recent  R-life  are  the  most  recent  load  and  use 
respectively.  In  our  experiments  we  used  the  instruction  count  as  our  time  measure;  the 
computed  time  could  be  equally  well  used. 

As  each  R-life  is  detected,  its  length  is  immediately  known.  Similarly  the  number  of 
references  to  each  R-life,  the  number  of  memory  and  register  references  etc.  are  easily 
accumulated  by  this  algorithm. 

Distributions  of  lifelengths  and  usages  per  R-life  from  a typical  analysis  run  are  shown  in 
Figure  4-2.  Because  of  the  dominance  of  short  lives  but  with  a significant  number  of  long 
ones,  a logarithmic  division  was  used  in  the  table.  These  results  are  too  voluminous  to 
present  in  full  for  all  of  our  subject  programs.  In  Figure  4-3  we  tabulate  for  each  subject 
P'Ogram  what  fractions  of  all  the  lives  are  accounted  for  by  lives  of  lengths  at  most  7,  15 
and  31  instructions.  Similarly  in  Figure  4-4  we  tabulate  the  fractions  of  all  lives  that  are 
accounted  for  by  lives  with  at  most  3,  7 or  15  usages. 

A summary  of  other  results  of  this  algorithm  from  analyzing  our  subject  programs  is  shown 
in  Figure  4-5  through  4-11.  All  these  results  were  obtained  under  the  assumption  that  a 
register  was  dead  when  it  had  been  dormant  for  200  instructions.  The  reason  for  this 
assumption,  and  a r scussion  of  its  consequences,  is  given  in  Section  4.6.  For  the  present 
results  it  means  that  a few  lives  {the  exact  number  is  tabulated  in  Figure  4-26)  are 
considered  as  two  or  more,  with  correspondingly  shorter  lives  and  fewer  references  per  life. 

This  algorithm  is  critically  dependent  on  the  ability  to  define  the  "load"  and  "access"  usage 
attributes  with  the  intended  intuitive  meaning.  Certain  instruction  sequences,  like  HRR,  HRLf 


f These  instructions  load  the  right  and  left  halves  of  a register  respectively,  leaving  the 
other  half  unchanged.  Alone  they  were  considered  modifying  instructions;  however,  HRRZ 
etc.,  which  explisitely  change  the  whole  register,  were  considered  loading. 
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FIGURE  4-2 

Distributions  of  lifelengths  and  usages  per  R-life 
{FORFOR  compiling  Treesort) 
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FIGURE  4-3 
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FIGURE  4-4 

Fraction  of  lives  used  at  most  3 times 
used  at  most  7 times 
used  at  most  15  times 
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FIGURE  4-5 


Number  of  register  lives 
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The  high  number  of  R-lives  for  the  FORFOR  and  ALGOL  versions  of  Crout,  compared  to  the 
BLISS  version,  is  probably  due  to  the  use  of  double  length  arithmetic  in  those  versions. 
Similarly  the  high  number  of  register  lives  for  the  ALGOL  versions  of  Havie  and  Ising  is 
probably  due  to  the  large  number  of  procedure  and  name  parameter  calls. 


FIGURE  4-6 

Average  lifelength  in  instructions 
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FIGURE  4-7 
Usages  per  R-life 
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FIGURE  4-8 

Average  number  Of  live  registers 
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FIGURE  4-9 

Memory  references  per  instruction 


Algorithm\language 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Bairstow 

0.61 

0.52 

0.50 

0.62 

0.60 

Crout 

0.44 

0.59 

0.50 

0.55 

0.64 

Treesort 

0.65 

0.50 

0.51 

0.57 

0.63 

PERT 

0.51 

0.47 

0.53 

0.69 

0.63 

Havie 

0.30 

0.45 

0.31 

0.44 

0.35 

Ising 

0.40 

- 

0.60 

0.67 

0.60 

Secant 

“ 

- 

“ 

0.60 

0.53 

Algorithm\Programmer 

E 

B 

A 

G 

L 

Aitken 

0.45 

0.48 

0.52 

0.50 

0.53 

Source  progr.\Compiler 
Treesort 

ALGOL 

0.40 

BASIC 

0.32 

BLISS 

0.45 

FORFOR 

0.42 

FORTEN 

0.40 

The  instruction  fetches  are  not  included  in  the  memory  reference  counts. 


FIGURE  4-10 

Register  references  per  instruction 
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FIGURE  4-11 

Register  references  per  r ,'mory  reference 


Algorithm\language 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Bairstow 

2.7 

2.0 

3.2 

2.2 

2.3 

Crout 

3.8 

2.1 

3.3 

2.8 

2.3 

Treesort 

2.5 

2.1 

3.2 

2.2 

2.1 

PERT 

3.1 

2.2 

3.0 

1.8 

1.9 

Havie 

5.2 

2.5 

5.2 

3.1 

3.3 

Ising 

4.0 

- 

2.8 

1.7 

1.9 

Secant 

“ 

“ 

~ 

2.3 

2.1 

Algorithm\Programmer 

E 

B 

A 

G 

L 

Aitken 

3.7 

3.5 

3.3 

3.4 

3.1 

Source  progr.\Compiler 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Treesort 

2.7 

3.5 

2.9 

3.3 

2.9 

on  the  PDP-10  effectively  constitute  a load,  but  usages  of  these  instructions  in  other  cases 
do  not.  As  a consequence,  some  lives  may  not  be  properly  detected. 

A comparison  of  the  results  of  our  sequence  program,  as  described  in  Section  5.2,  with 
the  listing  of  the  ALGOL  run  time  support  system,  seems  to  indicate  that  this  source  of  error 
may  be  significant  for  our  ALGOL  programs,  particularly  Crout,  Havie  and  Ising,  which  contain 
many  procedure  calls  and  nama  parameter  transmissions.  For  the  compilers  traced  there  are 
many  halfword  loads,  but  no  significant  pairs  of  halfword  loads,  and  for  the  otner  programs 
there  are  no  danger  signs  in  our  results. 


4.4.1  Summary 

We  summarize  these  initial  results  as  follows: 

Register  lives  are  in  general  short,  less  than  32  instructions.  Only  for  3 of  our  1 subject 
programs  are  more  than  102  of  the  R-lives  32  instructions  or  longer,  and  for  11  of  the 
programs  992  of  the  lives  are  shorter  than  32  instructions.  The  average  lifelength  is  less 
than  24  instructions  for  all  programs,  less  than  15  for  32  of  them  and  less  than  10 
instructions  for  14  programs.  These  results  vary  systematically  with  the  algorithm;  PERT  and 
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Treesort  have  short  lives,  Havie  has  long  lives.  The  BASIC  programs  form  an  exception,  they 
all  have  lifelengths  between  11.2  and  12.3  instructions. 

The  average  number  of  usages  per  life  varies  between  3.1  (FORFOR  PERT,  FORFOR  lsing)  and 
0.6  (BLISS  Treesort).  Again  the  results  from  the  BASIC  programs  vary  little  with  algorithm 
(3.4  to  3.  T\  the  other  results  vary  more  with  the  algorithm,  but  not  very  systematically 
except  for  the  two  FORTRAN  versions.  These  correlate  well  with  each  other. 

The  average  number  of  live  registers  is  less  than  7 for  ail  41  programs,  4 or  less  for  24  of 
them.  ALGOL  programs  generally  Keep  more  registers  live  than  do  programs  in  the  other 
languages  (See  footnote  on  page  74).  The  results  from  the  BASIC  programs  again  vary 
little  with  the  algorithm.  The  correlation  between  the  FORTRAN  versions  is  not  a?  good  as 
for  the  lifelengths  and  the  usages  per  life. 

The  high  ratio  of  register  references  to  memory  references  suggest  that  those  registers 
which  are  live  are  effectively  used  for  temporary  results. 


I 


The  influence  of  language  and  algorithm  is  not  clear.  Generally  results  from  the  BASIC 
programs  are  almost  independent  of  the  algorithm,  and  the  ALGOL  results  often  show  a 
consistent  trend,  but  with  some  variation.  In  some  cases  the  correlation  between  the  two 
FORTRAN  versions  is  good.  This  indicates  that  the  differences  found  are  due  to  language  and 
not  to  implementation.  Variations  due  to  the  programmer  are  marked,  as  witnessed  by  the 
results  from  Aitken. 


4.5  Register  life  classification 


Specialization  of  registers  may  seem  irrelevant  in  view  of  the  current  tendency  towards 
general  register  structures,  and  the  consequent  increased  generality  of  ISP  ard  program 
structure.  However,  specialization  may  be  of  relevance  in  short  wordlength  computers, 
where  the  addressing  space  saved  by  omitting  register  addresses  can  be  used  for  more 
important  capabilities. 


To  assess  the  utility  of  a fu11  set  of  operators  for  each  register  we  need  to  know  which  kinds 
of  operations  are  performed  on  a register  during  its  R-life.  One  way  of  obtaining  this 
information  is  to  use  a finer  register  usage  classification  than  the  "loaded",  "accessed  one 
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sufficient  to  determine  the  lives*,  and  to  extend  the  life  detection  algorithm  to  compute  the 
usage  clast  for  each  R-life.  That  is:  at  each  usage  of  an  R-life  the  appropriate  usage 
attribute  is  included  in  the  usage  class.  Hence  the  number  of  R-lives  in  each  usage  class  may 
be  accumulated. 

This  method  for  classifying  R-lives  has  two  variants.  One  is  to  accumulate  the  usage  classes 
strictly  for  one  register  life.  The  other  is,  for  binary  operations,  to  let  the  the  usage  class  of 
the  result  become  the  union  of  the  classes  of  the  operands.  The  former  is  most  relevant 
when  we  analyze  a structure  with  very  general  registers  to  detect  unneeded  generality,  the 
second  variant  can  be  used  on  an  ISP  with  specialized  registers  to  see  the  need  for  a more 
general  structure.  Our  experimental  results  were  obtained  by  the  former  variant. 

The  information  may  be  tabulated  by  the  register  number,  allowing  us  to  see  for  each 
physical  register  how  it  was  used.  More  interesting  is  to  tabulate,  for  each  usage  class, 
statistics  on  the  number  of  lives  in  each  class,  their  average  length  and  number  of  usages. 
We  call  this  the  usage  class  table  or  UCT. 

None  of  our  analyses  showed  more  than  200  different  usage  classes.  About  half  of  these 
account  for  more  than  997.  of  the  total  number  of  lives.  Hence  the  UCT  forms  a very  compact 
database  describing  the  register  usage,  which  can  be  manipulated  or  stored  for  later  use  at  a 
low  cost.  A natural  format  is  to  store  the  UCT  sorted  by  the  number  of  lives  in  the  class,  or 
by  the  sum  of  the  lifelengths  represented  by  the  class.  Thus  we  may  cheaply  ask  questions 
that  were  not  thought  of  at  the  time  of  the  original  analysis  and,  in  particular,  we  may  study 
that  UCT  which  is  the  union  of  all  the  UCTs  of  the  individual  subject  programs.  Unfortunately 
it  was  not  realized  until  a late  stage  in  our  experiments  that  the  UCTs  would  be  small.  Hence 
we  have  not  saved  the  UCTs  from  our  analyses. 

Several  forms  of  output  may  be  obtained  from  the  UCT.  A very  simpleminded  output 
procedure,  which  takes  usage  classes  as  its  parameters,  can  be  employed  to  print  data 
pertaining  to  all  classes  that  are  subsets  of,  supersets  of,  or  other  simple  combinations  of  the 
classes  given  as  parameters,  ’n  this  way  we  may  obtain  statistics  on  the  usage  classes  a 
priori  thought  to  be  significant.  Another  procedure  may  > used  to  find  combinations  of 
attributes  that  frequently  occur  in  the  same  usage  class.  The  result  of  such  an  analysis  will 
be  an  a posteriori  classification  of  the  R-lives  corresponJing  to  suitable  types  of  more 
specialized  registers. 


f The  one  in  Section  4.3  is  a typical  example 
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In  our  case,  we  believed  a priori  that  the  classification  into  floating  point  accumulators,  fixed 
point  accumulators,  index  registers  with  simple  arithmetic  capabilities  and  temporary  storage 
only,  is  of  such  a significance  (See  page  41).  This  belief  is  well  founded  in  history.  We 
display  the  fraction  of  lives  in  each  of  these  arithmetic  classes  in  figures  4-12  through 
4-15.  Each  class  is  defined  by  the  "strongest"  form  of  arithmetic  used  in  it,  floating  point 
being  stronger  than  fixed  point  multiply  and  divide,  which  again  is  stronger  than  fixed  point 
add  and  subtract.  R-lives  not  used  for  arithmetic  may  still  be  used  for  logical  or  other 
operations.  These  four  classes  are  disjoint.  We  denote  them:  Fldfltin&<  Fixed,  Counter,  and 
Noari. 

Some  other  classes  were  also  thought  to  be  of  interest.  The  fractions  of  R-lives  that  were 
used  only  as  storage  locations  are  tabulated  in  Figure  4-16,  this  class  is  denoted 
Temporary.  The  fractions  of  R-lives  used  for  indexing  (whether  for  data  accessing,  jumps  or 
immediate  operands)  are  tabulated  in  Figure  4-17.  This  class  is  not  disjoint  from  the 
arithmetic  classes,  and  is  denoted  Indexing. 

Yet  another  classification  of  interest  is  the  intersection  of  the  indexing  class  with  the 

arithmetic  classes.  We  have  no  concise  results  for  these  classes,  except  the  printout  of 

statistics  for  all  indexing  classes  discussed  below. 

An  output  procedure  as  described  above  was  programmed  to  print  the  number  of  lives, 
fraction  of  total  number  o*  lives,  average  lifelength  and  an  interpretation  of  the  usage  class 

encoding,  for  the  selected  set  of  classes.  It  was  used  to  print  the  whole  of  the  UCT  as  well 

as  the  subclasses  for  arithmetic  and  indexing  discussed  above.  An  example  of  this  output  is 
given  in  Appendix  B. 

A study  of  these  printouts  brought  up  several  questions  which  could  not  be  quantitatively 
investigated  since  we  did  not  have  access  to  the  old  UCTs.  We  formulated  several 
hypotheses,  however,  and  checked  them  manually  in  a scan  over  all  the  printed  results. 

1)  A significant  number  of  lives  are  of  length  one.  This  was  verified.  Some  partial 
explanations  could  be:  Values  of  subroutines  returned  in  registers  but  not  used  at  the 
call  site.  Double  length  results  of  integer  multiplication  and  two  results  of  division 
(quotient  and  remainder)  where  only  one  is  used.  Linenumbers  of  BASIC  programs  are 
loaded  into  a register  for  each  source  line  executed,  these  are  used  only  when  errors  are 
detected. 
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FIGURE  4-12 

Fraction  of  lives  with  no  arithmetic 
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FIGURE  4-13 


Fraction  of  lives  with  fixed  point  add/subtract 
Class  Counter 
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FIGURE  4-14 

Fraction  of  lives  with  fixed  point  multiply/divide 
Class  Fixed 
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FIGURE  4-15 

Fraction  of  lives  with  floating  point  arithmetic 
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Fraction  of  R-livas  used  as  temporaries  only 
Class  Temporary 
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FIGURE  4-17 

Fraction  of  lives  used  for  indexing 
Class  Indexing 
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2)  A significant  fraction  of  the  R-lives  are  never  stored.  This  hypothesis  was  verified  for  all 
subject  programs.  It  clearly  demonstrates  that  registers  are  not  only  needed  to  produce 
result*.  but  also  as  indices  and  fast  temporary  storage. 

3)  The  usage  classes  representing  most  lives  have  few  attributes,  i.e.  2 or  3.  This 
hypothesis  was  verified  in  all  subject  programs.  It  supports  the  idea  put  forward  by 
Knuth  [KnuD70],  that  programmers  rarely  do  anything  complicated. 

4)  Most  lives  for  indexing  use  no  arithmetic  at  all.  This  was  true  in  most  cases,  but  with 
notable  exceptions. 

5)  Most  lives  used  for  indexing  have  no  arithmetic  stronger  than  fixed  point  add  and 
subtract.  Largely  verified,  but  strong  exceptions.  Particularly  noteworthy  was  the  Crout 
algorithm,  the  only  one  where  two  dimensional  arrays  were  used.  There  was  a great 
difference  between  programs  using  a multiplicative  address  calculation  (dope  vectors) 
(FORTRAN  and  BLISS  versions)  and  those  using  Iliffe  vectors  (ALGOL  version)  for  array 
accessing. 

6)  Lives  used  for  floating  point  arithmetic  rarely  use  fixed  point  arithmetic.  True  for  all 
subject  programs  that  have  a significant  amount  of  floating  point  arithmetic.  The 
indications  were  that  the  exceptions  were  usages  for  fixed  to  floating  conversion  or  vice 
versa,  largely  occuring  in  the  initialization  phases  of  our  programs. 

Another  observation  was  that  most  usaga  classes,  although  not  the  most  frequent  ones, 
contained  the  "tested"  attribute. 

An  obvious  source  of  error  with  this  method  is  its  dependence  on  the  correct  detection  of  R- 
lives,  as  discussed  on  page  49.  As  noted  there,  this  error  may  be  significant  for  some  of  our 
ALGOL  programs. 

Another  deficiency  is  that  the  representation  of  a uin^e  class  does  not  take  into  account  that 
some  attributes  may  contribute  to  the  class  many  more  times  than  others.  The  algo-ithm 
could  be  augmented  to  compute  the  number  of  occurrences  of  each  usage  attribute  while 
accumulating  the  class  of  an  R-life.  Even  if  these  counts  were  averaged  over  the  . ves  in 
each  usage  class,  one  word  of  storage  would  be  required  for  each  combination  of  att-ibute 
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and  usago  class,  i.e.  at  least  4000  words.  Since  most  lives  are  short  and  of  few  usages,  we 
believe  that  this  addition  to  the  algorithm  does  not  justify  its  cost.  We  be'ieve  that  the  trend 
of  such  results  would  be  that  the  infrequent  events  are  even  less  frequent  than  shown  by 
our  present  methods. 


4.5.1  Summary 

The  results  in  figures  4-16  to  4-15  lead  us  to  the  following  conclusions: 

For  algorithms  containing  floating  point  arithmetic,  up  to  427.  of  the  R-lives  are  from  the 
"Floating"  class,  but  usually  considerably  fewer:  207  to  377.  The  BASIC  programs  form  an 
exception,  even  though  all  arithmetic  in  BASIC  is  done  in  floating  point,  at  most  267  of  the  R- 
lives  are  from  this  class.  Except  for  BASIC  programs,  there  is  a systematic  variation  with  the 
algorithm. 

Lives  with  fixed  point  multiplication  and  division  occur  almost  only  in  the  programs  that  use 
the  multiplicative  method  for  matrix  access,  or  that  use  integer  division  for  unpacking.  Hence 
the  dependence  on  algorithm  is  marked,  but  less  so  than  for  the  "Floating"  class,  and 
particular  techniques  used  by  or  enforced  by  the  language  or  its  implementation  become 
significant. 

For  the  other  classes,  the  interaction  of  the  needs  of  the  algorithm  with  the  register 
allocation  mechanism  of  the  compilers  obscure  any  systematic  effects  due  to  each  of  these 
factors  singly.  There  is,  however,  some  more  stability  to  the  results  from  the  ALGOL  and 
BASIC  programs  than  from  the  others.  This  is  most  probably  due  to  the  run  time  system  of 
ALGOL  and  to  the  lack  of  integer  arithmetic  in  BASIC. 

ALGOL  programs  have  a high  number  of  lives  in  the  "Counter"  class,  (307  to  507  of  the  lives); 
BASIC  programs  have  a very  large  number  of  lives  with  no  arithmetic  (637  to  747).  ALGOL 
programs  also  have  a high  number  of  lives  in  this  class  (217  to  697). 

487  to  597  of  the  R-lives  in  ALGOL  programs  are  used  for  indexing.  The  fraction  of  indexing 
lives  is  also  high  in  BLISS  programs  (237  to  687)  and  BASIC  programs  (377  to  427),  but  not 
consistently.  For  the  FORTRAN  programs  this  fraction  varies  between  197  and  497,  the 
agreement  between  the  two  FORTRAN  versions  is  good. 


REGISTER  STRUCTURE 


65 


For  The  "Temporary"  class,  the  results  vary  between  0 and  28Z.  For  ALGOL  programs  the 
results  are  consistently  low,  0 ll  to  7.2 7.  For  BASIC  programs  they  are  high:  6.77  to  287. 

The  substance  of  these  results  is:  The  classes  for  strong  arithmetic  are  used  only  if  the 
algorithm  or  the  accessing  method  used  by  the  compiler  requires  such  arithmetic.  Hence  for 
these  classes  the  dependence  on  the  algorithm  is  strong.  In  the  classes  for  weak  and  no 
arithmetic  the  results  seem  to  depend  more  on  the  language,  particularly  for  those  languages 
which  enforce  a strong  regimen  on  their  programs,  such  as  ALGOL  by  its  run  time  system  and 
BASIC  by  its  restriction  to  floating  arithmetic  and  by  its  strictly  statement  by  statement 
execution  (no  information  is  carried  in  registers  between  source  program  lines). 

These  findings  corroborate  those  of  Alexander  [AleW72],  which  indicate  that  two  or  three  of 
the  physical  registers  On  the  IBM  360  are  used  as  accumulators,  whereas  most  of  them  are 
used  as  indices  or  base  registers. 

The  results  for  the  '-'ORTRAN  and  BlISS  programs  show  little  systematic  variation  except  for 
a good  agreement  between  the  FORTRAN  versions  of  the  same  algorithm. 


4.6  Register  block  size 

The  results  presented  in  Figure  4-9  through  Figure  4-11  indicate  that  for  our  subject  set  the 
number  of  register  references  is  between  two  and  three  times  the  number  of  memory 
references.  Hence  the  need  for  a register  block  is  well  demonstrated  by  experiment,  as  well 
as  being  motivated  by  programmer  experience.  The  problem  is  more  one  of  size,  i.e.  how 
many  registers  can  be  utilized  efficiently  enough  to  warrant  their  cost.  In  addition  to  its 
obvious  dependence  on  the  other  properties  of  the  ISP,  this  number  depends  on  the 
structure  of  the  algorithm,  the  cleverness  of  the  programmer  and  the  compiler  and  the 
fineness  of  the  factorization  of  the  program.  The  combined  effect  of  these  factors  is 
represented  by  our  subject  set. 

We  now  present  a sequence  of  methods  which  in  a gradually  better  way  measure  the  utility 
of  the  register  block  and  the  time  costs  associated  with  its  usage. 

We  have  already  presented  some  crude  measures  in  Section  4.4:  The  number  of  memory  and 
register  references  per  instruction  presented  in  figures  4-9  through  4-11  are  of  relevance, 
another  measure  is  the  average  number  of  live  registers  in  Figure  4-8. 
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Some  better  measures  could  be  developed  if  we  knew  the  number  of  registers  that  are  are 
live  at  each  point  in  the  program.  In  the  next  subsection  we  present  an  algorithm  for 
computing  this.  This  algorithm  is  extended  to  compute,  for  any  N,  what  fraction  of  the  time  at 
least  N registers  were  live,  and  finally  to  give  a coarse  estimate  of  the  time  cost  incurred  if 
the  number  of  registers  were  reduced  below  the  maximum  used  by  the  program.  This 
estimate  is  based  on  the  number  of  usages  in  each  R-life.  A further  improvement  takes  into 
account  long  dormant  periods  of  registers.  We  now  describe  these  algorithms,  the  associated 
cost  measures,  and  the  experimental  results,  in  more  detail. 


4.6.1  Detecting  simultaneous  lives 

The  algorithms  are  embodied  in  a two  stage  (or  pass)  program,  the  first  stage  reads  the 
trace  and  writes  an  intermediate  file  of  data  items  describing  each  R-life.  This  file  is 
processed  in  the  reverse  order  by  the  second  stage.  The  algorithms  are  described  below, 
and  illustrated  by  an  example  in  Figure  4-' 

The  first  stage  is  actually  the  algorithm  which  detects  register  lives,  described  in  Section  4.4, 
with  a minor  addition:  As  each  R-life  is  determined,  (at  the  start  of  the  next  R-life  for  that 
register),  a data  item  containing  the  times  of  its  transitions,  its  usage  class,  number  of  usages 
etc.  is  written  to  the  intermediate  file. 

The  second  stage  reads  this  file  backwards  while  maintaining  a simulated  time  (s-time)  which 
decreases  as  the  algorithm  proceeds.  Initially  the  s-time  is  the  duration  of  the  program,  later 
it  is  equal  to  the  time  of  the  transition  most  recently  processed  by  the  algorithm  as 

described  below. 

The  stage  two  program  keeps  a data  entry  describing  the  state  {live  or  dead)  of  each 
physical  register,  there  is  also  a counter  of  live  registers,  and  a linked  list  of  at  most  two 
entries  (each  describing  an  unprocessed  transition)  per  physical  register,  as  described  below. 

Initially  the  second  stage  reads  the  data  items  decribing  the  last  R-life  for  each  register,  and 
enters  the  transitions  in  the  list,  sorted  by  decreasing  time.  The  algorithm  proceeds  by 
processing  the  transition  first  on  the  list,  i.e.  that  having  the  highest  time.  Current  s-time  is 
set  to  this  time,  and  the  table  and  counter  are  updated  according  to  the  nature  of  the 
transition.  If  the  transition  was  a first  use,  we  have  finished  processing  an  R-life.  The  next 
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data  item  for  that  register  is  immediately  read  from  the  file  (see  below),  and  its  transitions 
are  entered  in  the  list.  Hence  when  the  analysis  is  under  way,  the  list  contains  one  transition 
for  each  live  register,  (i.e.  its  first  use),  and  both  transitions  for  the  other  registers  (whose 
data  items  have  been  read,  but  whose  times  of  last  use  are  less  than  the  current  s-time). 

Note  that,  by  the  way  the  intermediate  file  was  written,  its  data  items  are  ordered  by  the 
time  of  first  use  of  the  next  (later  in  execution  time)  R-life  of  the  register  involved.  When 
the  file  is  read  backwards  by  stage  2,  one  item  is  read  each  time  a first  use  has  been 
processed.  The  item  read  is  the  one  that  was  output  by  stage  one  at  that  point  of  the  trace 
when  the  execution  time  of  the  subject  program  was  equal  to  the  current  s-time.  But  that  is 
exactly  the  data  item  describing  the  next  (earlier  in  execution,  lower  s-time)  R-life  for  the 
register  just  processed  by  stage  2.  An  exception  may  occur  when  the  same  instruction 
loaded  two  registers,  and  hence  started  two  R-lives,  in  which  case  their  order  in  the  file  may 
be  the  reverse  of  what  stage  2 expects.  Consequently  data  space  is  needed  to  describe  in 
full  exactly  one  R-life  for  each  physical  register,  plus  one  extra  R-life  possibly  being  held 
over  for  one  read  operation.  This  is  further  illustrated  in  Figure  4-18.  The  order  of 
events  during  the  interval  described  by  the  figure  is: 


Dud  ns  execution: 

Before  TO:  RO,  R2  and  R3  are  live. 

At  TO:  R1  is  loaded,  L10  starts.  R3  is  accessed. 

At  Tl:  RO  is  loaded  using  RO  as  index.  Hence  LOO  and  L01  overlap  at  Tl. 

At  T2:  Last  usage  or  L01  and  L20. 

At  T3;  Last  usage  of  L10;  RO  is  loaded;  hence  L02  starts.  R3  is  accessed  for  the  first 

time  since  TO. 

At  T4:  Last  usage  of  L30;  R1  is  loaded;  hence  Lll  starts. 

At  T5:  Both  R2  and  R3  are  loaded  by  the  same  instruction.  L21  and  L31  start. 

At  T6:  Last  use  of  Lll. 

After  T6:  RO,  R2  and  R3  are  live. 


During  stage  1: 

At  Tl:  LOO  is  detected  and  its  data  item  output. 

At  T3:  L01  is  detected  and  its  data  item  output. 

At  T4:  L10  is  detected  and  its  data  item  output. 

At  T5:  L20  and  L30  are  detected  and  their  data  items  output  in  some  order. 


1 


i 


. 
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During  stage  2:  (Listed  in  order  of  occurrence  in  stage  2,  i.e.  by  decreasing  s-time). 

S-time  > T6:  The  data  items  DL02,  DL21  and  DL31  have  been  read  and  the  last  usages  of 
their  lives  processed.  DL11  has  been  read  but  its  transitions  have  not  yat 
been  processed. 

S-time  = T6:  Last  use  of  Lll  is  processed. 

S-time  *■  T5:  First  usages  of  L21  and  L31  are  processed,  assume  in  that  order.  After  L21 
has  been  processed  a data  item  is  read.  By  the  above  assumptions  this  is 
DL30.  Hence  it  will  be  held  over  in  temporary  storage,  and  DL20  is  read  from 
the  file,  and  entered  into  the  tables.  Next  the  first  usage  of  L31  is  processed 
and  DL30  is  fetched  from  the  temporary  store  and  entered  in  the  tables. 

S-time  - T4:  The  first  use  of  1 11  is  processed  and  DL10  is  read  from  the  file.  The  last  use 
of  L30  is  processed. 

S-time  • T3:  The  first  use  of  L02  is  processed,  and  the  data  item  DL01  is  read.  The  last  use 
of  L10  is  processed. 

S-time  • T2:  The  last  uses  of  L01  and  L20  are  processed. 

S-time  ■ Tl:  The  first  use  of  L01  is  processed,  the  data  item  DLOO  is  read  and  its  last  use 
immediately  processed. 

S-time  ■ TO:  The  first  use  of  L10  is  processed,  the  data  item  for  its  previous  life,  if  any,  is 
read. 


Now  assume  R3  was  dormant  from  TO  to  T3.  This  would  be  detected  by  stage  1 at  time  T3, 
the  data  item  for  the  first  part  of  L30  (call  it  DL30’)  would  be  output  at  this  time.  The  data 
item  for  the  second  part  of  L30  (i.e.  DL30”)  would  be  output  at  T5,  as  was  DL30.  During 
stage  2,  the  data  item  DL30”  would  be  read  at  s-time  T5,  its  usages  processed  at  T4  and  T3. 
At  T3  the  data  DL30’  would  be  read,  its  last  usage  would  be  processed  at  TO,  and  so  on  as 
before. 


For  each  interval  of  time,  the  number  of  live  registers  is  given  at  the  bottom  of  the  diagram. 
In  the  latter  case  it  would  be  reduced  by  1 between  TO  and  T3. 


This  concludes  our  discussion  of  Figure  4-18. 
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FIGURE  4-18 

A typical  situation  of  Register  usage. 


Assume  our  ISP  has  four  registers,  RO,  Rl,  R2,  R3.  The  successive  lives  of  Ri  are  denoted 

LiO,  Lil The  diagram  has  one  horisontal  line  for  each  register,  as  labelled.  This  line  is 

solid  when  that  register  is  live.  It  is  broken  when  that  register  is  dormant.  The  vertical  bars 
correspond  to  times  of  transition,  as  marked  on  the  time  axis  at  the  top. 


TO 


T1 


T2  T3  T4 


T5 


T>  time 


LOO 


L01 


L02 


RO: 


LIO 


LI  1 


Rl: 


L20 


L21 


R2: 


L30 


L31 


R3: 


+ 


LIVE:  3 4(3)  4(3)  2(1) 

| 1 { 


The  u*age  class  of  each  R-life  may  be  included  in  each  data  item  on  the  intermediate  file. 
Hence,  if  the  result  of  an  analysis  as  described  in  Section  4.5  should  indicate  that 
specialization  of  the  registers  is  desirable  we  may  do  this  simultaneity  determination  for  any 
usage  class  we  consider  important  in  addition  to  the  set  of  all  registers.  The  "state"  of  each 
physical  register  has  to  be  augmented  to  include  its  class,  and  an  encoding  of  this  class  into 
the  (probably  much  fewer)  classes  for  which  output  is  desired  must  be  deviced.  For  each 
output  class  a counter  of  live  registers  must  be  added. 


1 


I 


We  performed  these  analyses  for  the  subclasses  of  R-lives  defined  in  Section  4.5,  as  well  as 
for  the  class  of  all  registers.  A typical  output  from  phase  2 is  displayed  in  Figure  4-19.  A 
compressed  form  of  the  results  from  all  the  subject  programs  is  given  in  figures  4-20 
through  4-22. 
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FIGURE  4-19 

Output  from  simultaneously  live  register  analysis  for  program  FORTEN  HSvie. 
Distribution  of  number  of  live  registers  in  the  different  classes. 


For  each  class,  the  first  coloumn  gives  the  instruction  count  when  exactly  N registers  were 
live.  Coloumn  2 gives  the  fraction  of  the  total  instruction  count  for  this  state.  Coloumn  3 is 
a cumulation  of  coloumn  2,  it  gives  the  fraction  of  the  instruction  count  when  at  most  N 
registers  were  live. 


N NO  ARITHMETIC 


F1XP0INT  ADD/SUQ.  FIXPOINT  MUL/D1V.  N 


1 

25221 

0.693 

0.693 

1960 

2 

7680 

0.211 

0.504 

23837 

3 

1163 

0.032 

0.936 

7460 

4 

1038 

0.029 

0.964 

198 

5 

551 

0.015 

0.979 

Ll:4 

6 

43C 

0.012 

0.991 

134 

7 

256 

0.007 

0.998 

5 

8 

41 

0.001 

0.999 

0 

9 

47 

0.001 

1.001 

0 

10 

14 

0.000 

1.001 

0 

11 

0 

0.000 

1.001 

0 

12 

0 

0.000 

1.001 

0 

13  0 

TOTALS 

0.000 

1.001 

0 

13 

36444 

1.001 

33848 

0.054 

0.054 

410 

0.011 

0.011 

1 

0.655 

0.709 

215 

0.006 

0.017 

2 

0.205 

0.913 

14 

0.000 

0.018 

3 

0.005 

0.919 

0 

0.000 

0.018 

4 

0.007 

0.926 

0 

0.000 

0.018 

5 

0.004 

0.930 

0 

0.000 

0.018 

6 

0.000 

0.930 

0 

0.000 

0.018 

7 

0.000 

0.930 

0 

0.000 

0.018 

8 

0.000 

0.930 

0 

0.000 

0.018 

9 

0.000 

0.930 

0 

0.000 

0.018 

10 

0.000 

0.930 

0 

0.000 

0.018 

11 

0.000 

0.930 

0 

0.000 

0.018 

12 

0.000 

0.930 

0 

0.000 

0.018 

13 

0.930 

639 

0.018 

13 

N 

1 

FLOATING  POINT 
18172  0.499  0.499 

28218 

2 

6446 

0.177 

0.676 

5853 

3 

34 

0.001 

0.677 

350 

a 

0 

0 000 

0 677 

426 

5 

0 

0.000 

0.677 

718 

6 

0 

0.000 

0.677 

515 

7 

0 

0.000 

0.677 

335 

8 

0 

0000 

0.677 

45 

9 

0 

0.000 

0.677 

18 

10 

0 

0.000 

0677 

0 

11 

0 

0.000 

0.677 

0 

12 

0 

0.000 

0.677 

0 

13 

0 

0.000 

0.677 

0 

TOTALS 
13  24652 

0.677 

36478 

INDEXING  ANY  USAGE  N 


0.775 

0.775 

166 

0.005 

0.005 

1 

0.161 

0.936 

1104 

0.030 

0.035 

2 

0.010 

0.945 

3171 

0.087 

0.122 

3 

0.012 

0.957 

14985 

0.412 

0.534 

4 

0.020 

0.977 

15092 

0.415 

0.948 

5 

0.014 

0.991 

481 

0.013 

0.961 

6 

0.009 

1.000 

298 

0.008 

0.969 

7 

0.001 

1.001 

409 

0.011 

0.981 

8 

0.000 

1.002 

419 

0.012 

0.992 

9 

0.000 

1.002 

185 

0.005 

0.997 

10 

0.000 

1.002 

78 

0.002 

0.999 

11 

0.000 

1.002 

50 

0.001 

1.001 

12 

0.000 

1.002 

47 

0.001 

1.002 

13 

1002 

36485 

1.002 

13 
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, iuURE  4-21 

Number  of  registers  sufficient  907.  of  the  time 
for  the  arithmetic  classes  previously  defined  Classes  denoted  by 
FLO  ■ Floating,  FIX  - Full  fixpoint,  COU  - Fixpoint  add  subtract. 
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FIGURE  4-22 

Number  of  registers  sufficient  907.  of  the  time 
for  the  no  arithmetic  class  (NOA),  the  indexing  class  OND) 
and  the  total  class  (TOT). 
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4.6.2  Cost  of  reducing  the  register  block 

The  results  just  presented  show  clearly  that,  except  for  ALGOL  programs  and  the  ALGOL 
compiler,  at  most  8 to  10  registers  out  of  the  16  available  are  used  simultaneously*,  and  that 
many  only  for  short  intervals  of  time.  If  the  processor  were  equipped  with  fewer  registers 
than  this,  a time  and  space  cost  would  occur  by  having  to  store  registers  temporarily  in 
primary  memory.  Intuitively,  it  seems  from  the  above  results  that  for  a moderate  reduction 
in  the  number  of  registers  this  cost  would  be  low.  We  now  describe  an  extension  to  our 
algorithm  which  enables  us  to  compute  upper  bounds  for  this  time  cost. 

Assume  we  want  to  compute  the  additional  time  cost  incurred  by  running  the  program  on  an 
ISP  with  M registers  but  otherwise  similar  to  the  one  we  investigate.  At  some  point  in  the 
program  we  have  N simultaneous  I ves,  N > M.  We  select  the  N •*  M least  useful  lives  as 
described  below,  and  assume  that  these  can  be  interleaved  with  the  remaining  R-lives  in  the 
registers  used  for  the  latter  lives.  That  is:  Each  time  an  omitted  register  is  referenced, 
another  register  must  be  temporarily  stored,  and  the  desired  value  loaded  into  it.  This  value 
is  stored  after  use,  and  the  original  value  reloaded.  The  associated  time  cost  is  two  STORE 
LOAD  pairs  per  reference  to  the  selected  lives,  i.  e.  4 instructions  per  reference  if  the 
instruction  count  is  used.  If  an  R-life  L so  selected  for  omission,  is  selected  again  at  some 
later  time,  but  for  the  same  M,  the  cost  should  not  be  added  the  second  and  later  times. 

Th's  computation  is  done  during  the  second  stage  described  above,  each  time  we  process  a 
first  use.  It  can  be  done  simultaneously  for  all  desired  M,  and  for  many  criteria  of  usefulness 
of  lives.  Data  space  used  by  the  algorithm  is  proportional  to  the  number  of  criteria  times  the 
number  of  registers,  but  with  a low  factor  (at  most  5 words).  The  amount  of  computation 


* The  structure  of  an  ALGOL  program  is  almost  like  two  coroutines  calling  each  other,  viz.  the 
user  pregram  and  the  run  time  support  routines.  These  operate  on  disjoint  memory  cells  and 
almost  disjoin;  ;°ts  of  registers.  Similarly  the  ALGOL  compiler  consists  of  a lexical  analyser, 
a syntax  analyser  and  a code  generator,  each  having  its  own  set  of  registers  allocated  to  it. 
This  probably  acco  ints  for  the  exceptional  results  obtained  for  ALGOL,  and  also  indicates 
how  programs  may  b'?  structured  to  use  many  registers  effectively.  Further  explanation  may 
be  the  difficulty  of  de  ecting  multi -inst ructic  n loads,  as  described  on  page  49. 
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involved  is  small.  Hence  this  is  a relatively  cheap  measure  to  compute  once  we  are  doing  the 
simultaneity  analysis. 

Several  criteria  of  usefulness  can  be  used  to  select  which  R-lives  to  omit.  The  following 
were  tried: 

The  least  used  lives. 

The  least  densely  used  lives  (usages  per  lifelength). 

The  shortest  lives. 

The  longest  lives,  (flight  be  better  than  omitting  many  short  ones). 

Of  these,  the  "longest  lives"  never  gave  the  lowest  cost.  The  "shortest  lives"  criterion  rarely 
gave  good  results.  Almost  all  the  lowest  results  were  obtained  using  the  "least  used"  or 
"least  densely  used"  criteria.  Furthermore  the  criterion  giving  the  lowest  cost  often  changed 
with  the  number  of  available  registers  (i.e.  M)  even  for  the  same  program.  It  follows  that,  in 
an  analyst,  several  criteria  should  be  used,  including  the  3 first  ones  above.  The  best  cost 
obtained  in  each  ca^  should  then  be  used  as  an  upper  bound. 

We  present  a typical  output  in  Figure  4-23,  and  a summary  of  the  results  from  the  whole 
subject  set  in  Figure  4-24.  As  is  seen,  thr  cost  of  reducing  the  number  of  registers  in  most 
cases  is  low,  less  than  a percent  in  som ; cases,  and  less  than  157.  in  most,  but  running  very 
high  in  a few  cases  (707  - 1007  increase  in  cost).  We  investigate  this  further  below. 

Note  that  3 of  the  programs  which  give  extremely  high  costs  are  ALGOL  programs,  and  just 
those  which  have  many  procedure  calls  and  parameter  transmissions.  Hence  the  arguments 
presented  above  about  the  coroutine  like  structure  of  ALGOL  programs,  and  also  the  error 
discussed  on  page  49  in  connection  with  undetected  loads,  apply  with  force  to  these  results. 


4.6.3  borne  sources  of  error 

We  now  discuss  some  sources  of  errors  associated  with  this  method. 

The  most  significant  is  probably  that  the  lives  omitted  are  selected  on  basis  of  tneir  average 
properties.  A better  selection  might  have  been  made,  had  the  local  properties  of  lives  been 
known.  We  discuss  below  how  this  can  be  done. 
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FIGURE  4-23 


Cost  of  reducing  number  of  available  registers. 

Lives  with  lowest  utility  are  omitted,  4 utility  criteria  are  used. 
Sample  output  from  program  FORTEN  HSvie. 


UTILITY:  REFERENCES  IN  LIFE  UTILITY:  DENSITY  OF  REFERENCES 


# OF 
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0.2971 

39 

6 

1077  0.1183 
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11 
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10 
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0.0226 

70 

10 
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0.2981 

8 

9 
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0.0391 
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9 

2815 

0.3093 

12 

8 
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0.0769 

202 

8 

2888 

0.3173 

16 

7 

1014 

0.1114 

294 

7 

3009 

0.3306 

24 

6 

1342 

0.1474 

382 

6 

3170 

0.3483 

36 
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Upper  bound  for  time  cost  of  reducing  the  register  block 
to  10,  8 or  7 registers  respectively, 
given  as  relative  increase  in  instruction  count. 
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Furthermore  a program  written  for  an  ISP  with  few  registers  will  be  quite  different  in  its 
local  structure  from  a program  written  with  a large  register  block  in  mind.  Hence  this  method 
can  not  be  used  to  estimate  the  cost  of  large  reductions  in  register  block  size.  One  would 
also,  a priori,  believe  this  argument  to  hold  for  reduction  to  a relatively  small  number  of 
registers  even  if  the  program  did  not  use  many  in  the  first  place.  This  belief,  however,  is  not 
vindicated  by  our  results. 

For  the  same  reason  we  would  expect  the  upper  bounds  found  by  this  algorithm,  and  by  its 
modified  version  described  below,  to  be  considerably  higher  than  the  actual  cost  obtained  by 
average  to  careful  recoding  for  the  lower  number  of  registers. 

A third  source  of  errors  is  that  successive  lives  of  the  same  register  may  overlap  by  one 
instruction*,  hence  the  simulation  of  two  lives  in  one  register  may  not  be  valid.  We  have 
counted  the  number  of  such  overlaps  and  found  it  mostly  to  be  small  (see  Figure  4-25). 
Hence  this  source  of  errors  is  insignificant. 

Finally  our  simulation  might  be  invalid  because  there  were  not  enough  registers  available  to 
hold  the  necessary  lives.  Since  at  most  4 registers  can  be  involved  by  any  PDP-10 
instruction,  this  error  will  not  occur  for  M > 4.  We  never  used  M < 6. 


4.6.4  Utilizing  dormant  periods 

We  now  consider  a way  to  take  local  behaviour  of  registers  into  account  when  computing  the 
cost  of  running  with  a smaller  register  block,  This  is  done  by  assuming  that  a register  is 
dead  whenever  it  has  been  dormant  for  some  time  K.  If  this  assumption  should  be  wrong,  a 
time  cost  of  one  STORE,  LOAD  pair  applies  for  each  R-life  prematurely  terminated  based  on 
the  assumption. 

We  can  detect  such  dormant  periods  during  the  first  stage  of  the  analysis.  Each  time  a 


* As  when  loading  a register  using  the  same  register  in  the  address  calculation 
(MOVE  RG.FLOP(RG)).  If  we  had  used  a finer  grain  of  time,  as  discussed  in  Section  4.2,  this 
problem  could  have  been  avoided. 
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register  is  used,  it  is  easily  checked  if  its  previous  usage  was  more  than  K ago.  If  so,  the 
present  usage  is  processed  as  a load,  and  a "prematurely  killed"  counter  is  updated. 

The  effect  of  this  trick  is  tnat  a register  will  appear  to  be  dead  whenever  it  has  a long 
dormant  period.  Hence  during  this  apparently  dead  period,  the  number  of  live  registers  is 
reduced  by  one.  Non  overlapping  R-lives  of  other  registers,  occurring  within  this  period,  can 
be  accomodated  in  the  apparently  dead  register  at  no  cost  beyond  that  of  saving  and 
restoring  the  dormant  life  once  (i.e.  one  STORE  LOAD  pair).  This  cost  is  at  most  half  of  the 
cost  of  interleaving  any  two  lives,  snd  independent  of  how  many  other  lives  are  accomodated 
in  the  dormant  register.  Since  most  R-lives  are  short,  we  would  expect  a considerable 
decrease  of  cost  to  be  obtained  this  way.  However,  since  each  choice  of  K requires  a 
separate  intermediate  file,  at  least  logically,  and  the  simultaneity  determination  has  to  be 
done  for  each  of  these,  it  is  a more  costly  analysis  to  apply. 

An  alternative  approach  is  ti  use  a hybrid  method,  - some  reasonable  K is  chosen  for  phase 
one,  and  the  interleaving  process  is  applied  in  phase  2.  If  the  cost  so  obtained  seems 
unreasonably  high,  a new  analysis  can  be  run  using  a smaller  K. 

Fur  our  experiments  we  used  this  hybrid  method.  Unless  otherwise  specified,  K was  chosen 
to  be  200  throughout  all  the  experiments.  The  number  of  lives  prematurely  terminated  by 
this  assumption  is  tabulated  in  Figure  4-26.  Note  that  if  the  same  life  has  several  dormant 
periods  of  length  more  thar  K,  each  non  dormant  period  is  counted  as  a life. 

To  see  the  effect  of  varying  K,  we  performed  some  experiments  with  K=100,  K=60,  K=40  and 
K=25.  For  this  purpose  we  chose  programs  that  gave  particularly  high  cost  with  K=200,  in 
the  hope  that  cost  could  be  reduced  this  way.  The  programs  chosen  were  the  ALGOL 
versions  of  Ising,  Havie  and  Grout,  and  the  FORFOR  version  of  Crout.  For  comparison  we  also 
included  two  programs  where  fhe  analysis  algorithm  performed  well,  i.  e.  where  the  results 
for  K=200  were  regular  and  the  costs  low.  These  were  the  FORTEN  versions  of  Havie  and 
Crout.  The  results  are  displayed  ;n  Figure  4-27. 

The  overall  trend  of  these  results  is  that  the  upper  bound  of  the  cost  can  be  reduced 
considerably  by  using  a small  K.  However,  there  is  a point  where  the  cost  from  storing  and 
restoring  dormant  lives  becomes  comparable  to  the  cost  of  interleaving  lives,  and  the  total 
cost  rises.  This  point  is  higher  (larger  K)  the  lower  the  cost  of  interleaving.  We  have  at 
present  no  mechanical  way  of  guessing  what  K will  be  optimal  for  a given  program  without 
performing  a senes  of  experiments.  By  choosing  K as  low  as  25,  the  cost  of  reducing  the 
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FIGURE  4-25 

Fraction  of  lives  overlapping  their  successor 
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Lives  prematurely  terminated  by  200  instructions  dormancy  rule 
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register  block  was  dramatically  reduced  for  those  programs  where  this  cost  previously  was 
high.  The  increase  in  instruction  count  for  reducing  to  7 registers  was  in  all  cases  but  one 
brought  below  207..  We  believe  the  cost  for  this  program  could  be  brought  further  down  by 
using  even  lower  K. 

The  cost  obtained  by  any  of  these  methods  is  an  upper  bound,  hence  we  may  safely  assume 
the  smallest  of  them  to  be  a valid  upper  bound. 


4.6.5  Summary 

The  maximal  number  of  registers  used  simultaneously  by  any  of  our  41  subject  programs  is 
15.  For  17  programs  it  is  10  or  less.  10  registers  would  suffice  907  of  the  time  (instruction 
count)  for  all  the  programs,  987  of  the  time  for  36  of  them.  8 registers  would  suffice  907.  of 
the  time  for  36  programs,  987.  of  the  time  for  29  programs. 


BLISS  programs  use  the  fewest  registers,  BASIC  programs  also  use  few.  Hence  time  efficient 
programs  do  not  necessarily  use  many  registers.  ALGOL  programs  use  most  registers,  but 
not  more  than  maximally  used  by  FORTRAN  programs.  The  compilers  use  no  more  registers 
than  the  small  programs,  and  the  reduction  costs  for  the  compilers  are  not  significantly 
higher  than  for  the  small  programs.  Hence  the  size  and  complexity  of  the  program  has  little 
influence  on  these  results. 


The  results  for  the  individual  classes  show  that  907  of  the  time  2 floating  point  accumulators 
would  be  sufficient  for  all  the  programs,  1 register  with  full  fixpomt  abilities  would  be 
sufficient  except  for  the  F0RF0R  version  of  Crout,  and  5 registers  with  fixpoint  addition  and 
subtraction  would  suffice  for  all  programs.  Similarly,  7 registers  without  arithmetic 
capabilities  and  9 indexing  registers  would  be  sufficient  907  of  the  time  for  ail  the  programs. 


All  the  above  results  are  obtained  on  the  assumption  that  a register  is  dead  when  it  has 
been  dormant  for  200  instructions.  0u.'  experiments  using  a reduced  such  period  indicate 
that  lower  results  would  be  obtained  that  way. 


If  the  register  block  were  to  be  reduced  to  8 registers,  the  increase  in  instruction  count 
would  be  less  than  57  in  30  of  the  programs,  less  than  207  in  36  of  them.  Again  the  results 


REGISTER  STRUCTURE 


82 


FIGURE  4-27 

Relative  increase  o instruction  count  by  interleaving  R-lives 
as  a function  of  K and  M,  for  selected  subject  programs. 


Algorithm  Maximal  ALGOL  ALGOL  ALGOL  FORFOR  FORTEN  FORTEN 

dormancy  Ising  Havie  Crout  Crout  Crout  HSvie 


Lives  added 

200 

by  dormancy 

100 

4 

saving 

60 

6 

40 

10 

25 

51 

16 

156 

8 

224 

324 

29 

255 

602 

65 

3692 

611 

108 

4931 

2561 

2299 

Dormancy  part 
of  relative 
increase 


Total  increase 
for  reduction 
to  10  registers 


Total  increase 
for  reduction 
to  8 registers 


0.113 


0.068 

0.048 

0.041 

0.023 

0.113 


0.438 

0.410 

0.349 

0.316 

0.121 


.000 

0.001 

.001 

0.006 

.002 

0.006 

.014 

0.009 

.091 

0.087 

.000 

0.005 

.004 

0.010 

.005 

0.019 

.067 

0.019 

.090 

0.081 

00 

0.0 

02 

0.0 

04 

0.0 

06 

0.1 

26 

0.060 

0.054 

0.054 

0.015 

0.091 


0.575 

0.558 

0.556 

0.269 

0.994 


0.077 

0.009 

0.008 

0.009 

0.087 


0.385 

0.270 

0.269 

0.254 

0.088 


0.757 

0.731 

0.732 

0.277 

0.179 


0.005 

0.001 

0.019 

0.019 

0.081 


0.011 

0.012 

0.019 

0.019 

0.081 


06 

04 

0.0 

04 

0.0 

06 

0.1 

26 

REGISTER  STRUCTURE 


83 


are  based  on  maximal  dormant  periods  of  200  instructions  Additional  exper.ments,  using  A 
of  the  programs  where  reduction  was  most  costly,  show  that  by  reducing  this  period  to  25 
instructions  the  costs  were  reduceo  from  44/,  58/,  337  and  76/.  to  12/,  9.47,  9/  and  18/ 
respectively,  for  these  4 programs.  We  did  not  investigate  if  a further  reduction  to  20  or  15 
would  reduce  the  cost  further. 

The  cost  is  particularly  high  for  ALGOL  programs.  This  is  discussed  in  a footnote  on  page  74. 
FORFOR  Crout  also  has  a high  cost,  and  its  cost  was  the  hardest  to  reduce  by  decreasing  the 
maximal  dormancy.  For  BLISS  and  BASIC  programs  the  reduction  was  particularly  cheap,  less 
than  1/  for  each  program,  including  the  two  compilers  written  in  BLISS.  The  correlation 
between  the  two  FORTRAN  versions  is  not  particularly  good. 


4.7  Utilities  of  values 

The  methods  just  described  are  aimed  at  establishing  the  effect  of  reducing  the  register 
block,  and  our  experiments  indicate  that  the  registers  On  the  whole  are  not  used  very 
efficiently.  However,  there  might  be  values  in  memory  that  could  benefit  by  being  kept  in 
registers  if  the  programmer  or  compiler  had  realized  it.  Hence  it  would  be  desirable  to  have 
a utility  measure  which  indicates  what  values  are  most  important,  locally  in  time,  at  each 
point  in  the  computation.  Those  values  should  be  kept  in  registers  which  have  the  highest 
utility  at  that  point  in  time.  Further  if  values  of  high  utility  can  not  be  held  in  registers,  we 
have  an  indication  that  more  registers  should  ue  included  in  the  processor.  The  converse 
holds  if  only  a few  values  have  high  utility. 

Such  a measure  must  give  greatest  importance  to  values  used  by  the  current  instruction,  less 
weight  to  values  used  further  away  in  the  instruction  stream.  The  function  w(s)  below  is 
intended  to  express  this.  Furthermore  to  simplify  computations,  we  might  not  want  to 
consider  all  accesses  to  a value,  only  those  within  some  interval  of  time  containing  the 
current  instruction  execution.  This  is  expressed  by  the  function  i(s). 


A class  of  such  measures  can  be  defined  as  follows:  Define  the  utility  of  a value  V at  time  t 
to  be: 
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00 

P(V,t)  - ; w(T-t)  * i(T-t)  * u(V,T)dT 
0 

where 

w(s)  is  a weighting  function 

i(s)  is  1 in  the  interval  considered,  0 elsewhere 

i/V,t)  is  1 if  V was  used  by  an  instruction  executed  at  time  t, 

0 otherwise. 

w(s)  and  i(s)  can  be  chosen  freely  to  obtain  different  measures  of  utility,  whereas  u(V,t)  is  a 
formalization  of  the  trace.  In  choosing  w(s)  and  i(s)  one  must  take  care  that  values  used  by 
the  current  instruction  get  a higher  utility  than  any  other,  regardless  of  how  much  they  are 
used  in  the  surrounding  interval. 

It  is  reasonable  to  use  the  instruction  count  as  the  time  measure  rather  than  the  computed 
time.  Some  tentative  choices  for  interval  functions  can  then  be  classified  as: 

[n,m]  : i(s)  » 1 for  the  interval  containing  the  last  n and  next  m uses  of  the  value, 

0 otherwise. 

(n,m)  : i(s)  - 1 for  the  'ast  n and  next  m instructions, 

0 otherwise. 

One  such  measure  could  be  defined  as  follows: 

Let  k be  the  next  time  value  V will  be  used,  i.e.: 
u(V,T)  » 0 for  T in  [t,k>, 
u(V,k)  - 1 for  T - k, 
u(V,k)  is  irrelevant  otherwise. 


Now  let 

i(s)  - 

0 for  s < 

0 (T  < t) 

i(s)  - 

1 for  k > 

s > 0 

i(s)  • 

0 for  s > 

k 

and  let 

w(s) 

- l/(|s|  ♦ 

1) 

I e.  P(V,t)  is  inversely  related  to  the  time  until  the  value  will  next  be  used.  This  interval 
function  is  [0,1].  The  same  weighting  function  is  naturally  extended  to  any  (n,m)  or  [n,m] 
interval. 


It  is  obviously  impractical  to  perform  such  a calculation  for  all  memory  locations  at  all  times. 
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It  is  sufficient,  however,  to  consider  those  locations  that  are  "live"  or  “active"  at  each  point 
in  time.  Detection  of  such  active  periods  of  memory  locations  (M-lives)  can  be  done  in  a way 
much  similar  to  the  detection  of  register  lives.  Some  number  K must  be  selected  as  the 
maximal  dormant  period  permitted  within  an  M-life.  This  corresponds  roughly  to  an  interval 
function  of  type  (K,K).  Since  every  location  must  be  referenced  at  least  every  Kth  instruct  n 
in  order  to  stay  live,  at  most  K locations  can  be  live  simultaneously.  A K chosen  for  this 
purpose  would  hardly  be  larger  than  256.  Hence  the  data  space  required  for  detection  of  M- 
lives  is  definitely  manageable.  A hashing  scheme  must  be  used  to  access  the  tables  of  M-life 
data,  rather  than  the  register  address  that  was  used  for  the  R-life  tables.  Finally  we  must 
Keep  track  of  values  that  migrate  from  memory  to  registers  and  back. 

An  appropriate  weighting  function  would  probably  take  into  account  only  future  usages  of 
the  location.  By  using  a lookahead  of  K instructions,  the  utilities  of  the  live  memory  locations 
could  be  calculated. 

We  did  not  do  this,  but  propose  it  as  a possible  tool  to  use  for  assessing  the  utility  of  a 
larger  register  block,  or  to  assess  the  Optimal  size  of  a register  block  assuming  a future  more 
intelligent  compiler. 
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4.8  Register  structure,  Conclusions 

We  now  conclude  the  presentation  of  our  methods  for  register  structures.  We  have  shown 
how  to  detect  register  lives,  how  to  find  the  number  of  simultaneous  lives  and  how  to  find  an 
upper  bound  on  the  time  cost  incurred  if  the  number  of  registers  were  to  be  reduced.  Our 
results  are  summarized  in  sections  4.4.1,  4.5  1 and  4.6.5.  On  the  whole,  our  experimental 
results  seem  to  indicate  that  the  time  cost  incurred  by  having  only  S general  registers  on  the 
PDP-10  would  not  be  excessive.  (This  assumes  that  instruction  word  space  was  needed  for 
other  purposes). 

This  number  depends,  of  course,  on  other  architectural  properties  of  the  ISP.  If  the 
registers  were  specialized,  or  if  base  registers  were  introduced,  a larger  number  of  registers 
would  be  needed.  This  is  clearly  seen  in  the  results  of  Alexander  [AieW72],  4 or  more 
registers  in  the  IBM  360  were  kept  busy  as  base  registers.  On  the  other  hand,  if  the 
registers  were  removed  from  the  address  space  and  no  register  to  register  operations  were 
introduced,  memory  would  have  to  be  used  for  temporaries,  and  fewer  registers  would  be 
needed. 
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It  should  also  be  noted  that  the  results  for  a reduced  register  block,  though  they  are  upper 
bounds  in  one  sense,  can  not  be  attained  unless  the  register  allocation  policy  of  the 
compilers  is  sufficier  tly  clever.  In  particular,  dormant  periods  should  be  recognized,  and  no 
registers  should  be  allocated  to  a fixed  purpose. 

Finally  we  point  out  that  a reduction  in  the  number  of  registers,  or  a specialization  of  them, 
is  likely  to  imply  a higher  programming  cost,  since  the  programmer  will  have  to  spend  more 
thought  to  how  he  allocates  them. 

On  the  whole,  register  usage  is  determined  more  bv  t'  *nguage  and  its  implementation  than 
by  the  algorithm.  This  is  not  surprising,  since  the  prog,  ammer  usually  has  no  control  over 
register  usage.  The  observation  is  particularly  true  for  languages  that  use  a run  time 
system,  or  otherwise  impose  a strong  regimen  on  the  structure  of  their  object  code.  Thus 
our  ALGOL  and  BASIC  programs  distinguish  themselves  in  most  of  the  results  in  this  chapter, 
whereas  systematic  register  use  by  BLISS  and  FORTRAN  is  lacking. 

We  have  also  presented  a method  for  classifying  register  lives  with  the  object  of  assessing 
the  need  for  generality  of  registers.  Again  our  results  indicate  that  register  generality  is  not 
extremely  beneficial  to  program  efficiency,  and  that  little  would  be  lost  if  the  PDP-10  had, 
say,  2 floating  point  accumulators,  2 fixed  point  accumulators  and  8 index  registers. 
However,  the  other  motivations  for  general  registers  have  not  been  invalidated. 
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CHAPTER  5 

DATA  TYPES  AND  OPERATORS 


We  now  turn  to  the  data  types  of  the  processor,  and  the  operators  to  manipulate  data  of 
these  types.  We  look  at  two  problems: 


a)  How  to  detect  types  and  operators  that  are  in  the  ISP,  but  are  not  sufficiently  used 
to  justify  their  inclusion.  This  is  done  by  frequency  counts  and  various  derivatives 
thereof,  as  described  in  Section  5.1. 

b)  How  to  detect  data  types  and  operators  that  are  not  in  the  ISP,  but  could  be  included 
at  a benefit.  This  problem  may  be  approached  by  studying  instruction  sequences  and 
operand  values.  We  discuss  this  in  Section  5.2  through  Section  5.5. 

Again,  we  will  be  mostly  concerned  with  the  time  cost.  Most  of  the  methods  described  in  this 
section  also  apply  to  control  operators  and  in  part  to  address  calculation  methods,  as  will  be 
further  discussed  in  Chapter  6 and  Chapter  7.  As  an  introduction  we  give  some  general 
comments  on  data  types  and  the  associated  costs. 

A data  type  is  an  interpretation  rule  which  assigns  meaning  to  the  contents  of  one  (or  more) 
word(s),  or  parts  of  words.  A data  type  is  present  in  a computer  if  there  are  instructions 
that  manipulate  it.  We  list  some  commonly  occuring  data  types  and  in  some  cases  the 
associated  operations  or  other  characteristics. 

Word  (LOAD,  STORE) 

Arithmetic  (Test  of  magnitude  or  sign) 

Integer  (Single,  multiple  or  variable  length) 

Floating  point  (Single,  multiple  or  variable  length) 

Address  (LOAD,  STORE) 

Bit  (Test,  set) 

Bit  vector  (One  word,  logical  and  other  operators) 

Character  (Including  8-bit  bytes  as  in  the  IBM  360  etc.) 

Character  string 

Bvte  (Variable-length  bit  string  vr  field) 
b>  te  string 
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Byte  pointer  (Generalized  address) 

Word  vector 

Vector 

Matrix 

Array 

List 

Stack 

Stack  pointer 
Instruction  (Execution) 

This  list  is  not  exhaustive,  and  the  types  listed  are  neither  well  defined  nor  disjoint.  Some 
exist  only  for  transfer  purposes,  the  data  operations  being  subsumed  under  some  other  type. 
Some  are  generalizations  of  others,  i.e.  the  PDP-10  byte  and  byte  pointer  types  generalize 
all  partial  word  transfer  operations  (Address,  bit,  character,  character  string  etc).  The 
variable  length  arithmetic  types  will  usually  only  exist  on  character  or  decimal  based 
machines,  i.e.  business  oriented  machines. 

The  cost  of  including  a data  type  in  an  ISP  has  several  components: 

Consumption  of  space  for  the  opcodes  in  the  instruction  word. 

Cost  of  hardware  to  implement  it. 

Possibly  longer  time  to  decode  the  whole  instruction  set. 

A data  type  included  in  the  ISP  should  be  used  sufficiently  to  warrant  these  costs,  as 

discussed  in  Section  5.1. 

On  the  other  hand,  a data  type  or  some  of  its  operators  might  not  be  present  in  the  ISP 
although  it  is  much  needed  in  applications.  This  usually  means  that  the  necessary  data 
structures  and  operators  have  to  be  implemented  (interpreted)  in  terms  of  the  existing  data 
types  and  their  operators.  The  cost  shows  up  as: 

Increased  execution  time 
Increased  space  for  program 
Increased  time  for  programming 
Possibly  increased  space  for  data 

Less  readable  programs,  implying  an  increased  programming  cost. 

This  is  discussed  further  in  Section  5.2  through  Section  5.4. 
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A missing  but  desirable  data  type  might  also  be  a variant  of  an  existing  type  where  the 
existing  type  is  used  instead.  Examples  of  such  types  might  be  short  integers*  or  Booleans 
(i.e.  true/false  valued).  Since  such  types  are  simulated  by  existing  ones**,  their  desirability 
does  not  manifest  itself  as  an  instruction  sequence.  The  costs  of  not  having  such  data  types 
are: 

Space  cost  of  unnecessarily  occupied  memory. 

Time  cost  of  using  the  slower  instructions. 

We  du-cuss  this  further  in  Section  5.5. 


5.1  Frequency  counts 

The  obvious  way  to  expose  infrequently  used  data  types  and  operators  is  to  accumulate  the 
number  of  executions  of  each  instruction.  This  table  of  execution  counts,  the  instruction 
frequency  table  or  IFT,  is  another  compact  data  base  which  may  be  stored  and  used  at  a 
later  time  to  obtain  additional  information.  For  a given  ISP,  the  IFT  has  a constant  size, 
hardly  more  than  512  words  for  any  ISP. 

Once  it  is  built,  the  IFT  can  be  printed  out  sorted  by  opcode,  frequency  of  execution,  or  time 
spent  executing  each  instruction.  From  this  we  can  immediately  see  which  operators  are 
little  used  and  might  be  candidates  for  omission.  Similarly,  instructions  and  instruction  groups 
where  the  fraction  of  time  spent  is  significantly  larger  than  the  fraction  of  instruction 
executions,  are  possible  candidates  for  improved  implementation.  A variant  of  the  IFT  (see 
below)  is  presented  in  Appendix  D.  In  Figure  5-1  we  tabulate  the  number  of  different 
opcodes  used  by  each  subject  program,  and  in  Figure  5-2  we  tabulate  how  many  different 
opcodes  account  for  757,  907  and  997.  of  the  executed  instructions  for  each  subject  program. 

Clearly  one  can  not  omit  instructions  from  the  ISP  on  the  strength  of  their  non  usage  by  one 
program.  Hence  it  is  necessary  to  build  IFTs  that  are  the  surn  of  IFTs  for  individual 
programs.  Summation  can  be  over  the  whole  subject  set,  or  a subset  thereof.  When 
computing  such  IFTs,  the  data  for  each  program  should  probably  be  normalized  to  account 
for  the  different  program  lengths,  and  also  possibly  weighed  to  account  for  the  importance  of 
each  subject  program.  We  call  such  an  IFT  a SWIFT  (Summed  Weighed  IFT). 

* Partword  loads  and  stores  with  fullword  arithmetic  is  not  in  general  sufficient  because  of 
conventions  for  representing  negative  numbers,  and  overflow  warnings. 

♦♦  Fullword  integers  and  bit  vectors  for  short  integers  and  Booleans. 
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Another  form  of  summed  IFTs  is  the  SN1EI  (Summed  Normalized  IFT);  A SN1FT  is  reproduced 
in  Appendix  D,  including  the  printouts  rorted  by  instruction  count  and  computed  time,  as  well 
as  the  FGR  function.  It  was  computed  by  normalizing  each  subject  program  to  one  executed 
instruction,  summing  the  resulting  IFTs,  and  renormalizing  to  1 million.  This  permitted  the  use 
of  our  existing  program,  using  integer  arithmetic,  but  caused  a few  rounding  errors  in  the 
type  conversions.  Hence  the  total  counts  given  by  the  program  are  sometimes  a few 
instructions  off  the  exact  million.  By  scaling  to  a round  number,  the  individual  results  are 
easily  interpreted  as  fractions.  The  FGR  function  and  other  results  from  this  total  SN1FT,  and 
the  SNIFTs  for  the  compiler  set  and  the  numeric  and  nonnumeric  sets,  are  given  at  the 
bottom  of  the  respective  tables  in  this  section.  Since  we  did  not  weigh  our  programs,  some 
instructions,  particularly  unrounded  arithmetic,  which  are  frequent  in  some  special  contexts  in 
our  short  programs,  received  counts  that  seem  unreasonably  high. 


FIGURE  5-1 

Number  of  different  opcodes  used  by  subject  set. 
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FIGURE  5-2 

Number  of  opcodes  accounting  for 
757.,  907.  and  997.  of  the  executed  instructions 
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For  some  of  the  above  results,  as  lor  the  computed  time  in  general,  individual  instruction 
execution  times  are  needed.  They  can  be  taken  trom  the  manual  of  the  processor  in  question 
or  other  available  sources,  in  some  cases  assumptions  have  to  be  made  about  the  average 
properties  of  the  operands.  These  assumptions  may  have  critical  importance  in  the  case  of 
variable  length  operands  (including  bytes)  but  should  otherwise  be  of  little  consequence  y 
the  law  of  large  numbers.  If  variable  length  operands  are  common,  this  source  of  error  may 
be  reduced  by  including  in  the  trace  sufficient  information  that  the  correct  execution  time 

can  be  computed  during  analysis. 

Except  for  the  possible  dependence  of  instruction  times  on  operands,  tracing  is  too  powerful 
, tool  to  obtain  the  1FT.  A counter  in  each  straight  line  piece  of  code  in  the  subject  program 
plus  the  necessary  data  on  each  such  piece,  or  jump  tracing,  would  be  sufficient  Tracing 
does,  however,  have  the  advantage  of  general  applicability  as  discussed  in  Chapter  1. 


We  now  discuss  some  further  measures  computed  from  the  IFT. 


5.1.1  Instruction  classification  - Mixes 

In  order  to  better  see  the  relation  of  the  instruction  executions  to  the  data  types  and  other 
programming  structures,  we  may  g-oup  our  instructions  into  classes  and  print  the 
distributions  of  instruction  counts  or  computed  time  over  the  classes.  The  classification  may 
be  by  data  type,  control  function  or  other  properties.  In  some  cases  several  data  types  may 
be  grouped  into  one  class,  In  other  cases  a data  type  may  be  split  into  several  classes  e c 
depending  on  the  questions  to  be  asked.  This  may  be  viewed  as  mapping  the  instruction  se 
into  a generalized  and  smaller  instruction  set. 

Two  such  Casses  were  used  in  our  work.  One  of  these  was  devised  by  Gibson  IGibJ70] I in 
1959,  and  used  to  obtain  the  well  known  Gibson  mix.  It  has  later  been  modified  to  fit  more 
modern  computers  by  Gonter  [GonR69]  and  the  present  author.  This  classification , was 
intended  mostly  for  comparison  of  the  internal  processing  power  of  different ; c.n 
processors.  Another  classification,  Ihst  Swum  SkJliMA  itoligAtian  (or  Ea  classification), 
was  developed  by  the  present  author.  1,  is  intended  to  relied  the  control  operator.  0 • 
program  in  a better  way  than  does  the  Gibson  classification.  The  definitions 
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classifications  are  given  briefly  in  Figure  5-3  and  Figure  5-4.  For  the  full  definition  of 
the  Gibson  classification  we  refer  to  the  papers  by  Gibson  and  Gonter. 

We  use  the  term  distribution  (Gibson  distribution,  PS  distribution)  to  denote  the  observed 
distributions  for  any  (set  of)  program(s).  By  a mix  we  mean  the  observed  distribution  for  a 
set  of  programs  believed  to  be  representative  of  some  actual  workload  (i.e.  the  Gibson  mix 
[Gib J70],  the  UMASS  mix  [GonR69]  etc.). 

A classification  is  easily  described  by  a table  with  one  entry  for  each  instruction  in  a 
standard  format,  and  with  sorm  further  entries  describing  the  number  of  classes  etc.,  and 
giving  their  print  names.  This  table  can  be  interpreted  by  the  program  computing  (and 
printing)  the  distribution  over  the  classes  and  the  same  program  can  be  used  for  all 
distributions. 

The  original  Gibson  mix  for  the  IBM  650  and  704,  the  UMASS  mix  for  th?  GDC  3600,  and  the 
Gibson  distribution  for  our  subject  set  from  the  PDP-10,  are  reproduced  in  Figure  5-3. 
Our  program  structure  distribution  for  the  subject  set  and  its  subsets  is  given  in  Figure 
5-4.  When  studying  such  distributions  one  should  keep  in  mind  that  the  number  of 
instructions  in  each  class  is  not  the  same.  Hence  a class  of  a few  instructions  averagely  used 
may  have  a low  count  compared  to  a class  of  many  instructions  that  are  little  used. 


5.1.2  The  FGR  function  and  similar  measures 

The  most  striking  observation  from  a quick  glance  at  an  IFT  is  that  a small  number  of 
instructions  account  for  a large  fraction  of  the  executed  instructions.  An  abbreviated  form  of 
our  results  is  displayed  in  Figure  5-1  and  Figure  5-2.  This  suggests  that  one  might  reduce 
the  instruction  set  and  set  of  data  types  at  a low  cost.  Foster  et.  al.  [FosC71a]  have 
propo  ed  two  measures  related  to  this,  they  were  both  defined  in  Section  1 4,  but  we  repeat 
the  definitions  here. 

One  of  their  measures  is  the  information-theoretic  measure  of  information  content: 

T 

1 = - I p,  * log2(p,) 

i=i 

where 


•w 
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Pi  is  the  probability  of  using  the  i’th  opcode 
T is  the  total  number  of  different  opcodes 
log2  is  the  logarithm  base  2 


Their  other  measure  is  a function  computed  as  follows:  Order  the  operation  codes  by 
frequency  of  occurrence.  The  i’th  opcode  in  this  ordering  occurs  Cj  times,  i.e.  Cj  £ Cj.i  for  1 
<,  i < P-1,  where  P is  the  total  number  of  instructions  in  the  sample.  The  FGR  function  is  then 

defined  as: 

N 

FGR(N)  - 1 - 1/P  I Cj  (1  < N £ T) 

i*l 

FGR(N)  is  that  fraction  of  the  instructions  which  would  have  to  be  interpreted,  were  the 
instruction  set  reduced  to  the  N most  frequent  instructions.  However,  the  function  does  not 
guarantee  that  the  implied  recoding  is  possible  or  feasible. 


Both  of  these  measures  are  easily  computed  from  the  IFT.  They  may  be  computed  based  on 
the  lumber  of  executions  of  each  instruction,  i.e.  using  the  instruction  count,  or  based  on  the 
time  spent  executing  each  instruction,  i.e.  using  the  computed  time.  The  exact  instructions 
”removed"  depend,  of  course,  upon  this  choice.  In  the  latter  case,  Cj  should  be  the  time  used 
by  the  i’th  instruction  when  the  instructions  are  Ordered  by  the  time  spent  executing  them. 
Both  the  information-theoretic  measure  and  the  FGR  function  may  also  be  computed  from 
static  data,  and  will  then  measure  cost  of  repres.  ntation  rather  than  cost  of  execution. 

We  have  computed  the  information-theoretic  measure  with  respect  to  both  instruction  count 
and  computed  time.  Although  the  practical  value  of  these  measures  is  small,  they  give  some 
indication  of  the  overall  utilisation  of  the  instruction  set.  The  results  are  tabulated  in  Figure 

5-5. 

A much  better  measure  is  the  FGR  function,  which  gives  an  estimate  of  the  time  cost  incurred 
by  reducing  the  instruction  set.  We  compute  this  based  on  instruction  count,  and  with  a 
simple  extension.  Assuming  that  each  of  the  omitted  nstructions  can  be  recoded  in  terms  of 
K of  the  N remaining  instructions,  one  may  easily  compute  the  relative  increase  in  instruction 
count.  If  the  instructions  used  for  the  recoding  are  of  average  time,  the  relative  increase  in 
computed  time  will  be  the  same  as  that  in  instruction  count.  The  increase  in  space  cost  has 
to  be  found  by  static  methods,  the  FGR  function  computed  using  static  instruction  counts 
gives  the  fraction  of  written  instructions  that  have  to  be  rewritten. 
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In  Figure  5-6  we  tabulate  the  extended  FGR  function  for  N=64,  N=48  and  N=32,  assuming 
a recoding  factor  <K>  of  4,  i.e.  on  the  average  4 instructions  needed  to  interpret  each  omitted 
instruction.  This  factor  is  the  most  significant  source  of  error  and  is  very  hard  to  estimate, 
since  many  of  the  infrequently  executed  instructions  are  such  that  would  require  many  other 
instructions  to  mimic  exactly,  but  they  are  used  where  minimal  changes  of  a larger  context 
would  get  the  intended  operation  done  at  no  or  very  little  extra  cost.  Hence  the  choice  of  K 
should  be  based  on  which  instructions  are  candidates  for  omission.  If,  for  instance,  the 
floating  point  instructions  are  in  danger,  a factor  of  4 will  certainly  be  too  low. 

Ideally  one  would  want  to  compute  these  costs  using  actual  recodings  of  each  omitted 
instruction.  This  might  also  give  some  information  on  the  possible  increase  in  space  cost  for 
data.  This  process  is,  however,  not  easily  mechanized.  Manual  recoding  is  time  consuming, 
since  for  each  N considered  one  must  code  the  missing  instructions  in  the  most  optimal  way 
using  the  N remaining  instructions.  Possibly  the  data  representation  must  also  be 
reevaluated  each  time.  The  recoding  may  also  depend  on  space  and  time  constraints  for  the 
particular  application. 

To  properly  see  the  costs  of  removing  data  types,  results  similar  to  those  from  the  FGR 
function  should  be  computed  by  removing  all  instructions  relevant  to  a data  type  rather  than 
the  least  frequently  used  ones.  The  results  of  such  a calculation  can  usually  be  predicted 
well  by  a glance  at  the  Gibson  or  PS  distribution  in  question.  Also,  we  believe  it  may  be 
more  relevant  in  many  cases  to  omit  certain  of  the  operations  of  the  data  type  rather  than 
the  whole  type. 


5.1.3  Summary  of  frequency  results 

Our  experimertal  results  indicate  that  a small  number  of  instructions,  at  most  28,  account  for 
757.  of  the  executed  instructions  for  any  one  of  our  subject  programs,  and  that  112 
instructions  suffice  for  997  of  the  instruction  executions  for  any  one  program.  No  program 
used  more  than  162  instructions.  Assuming  a recoding  factor  of  4,  30  of  the  41  programs 
could  be  run  on  a processor  with  64  instructions  at  an  increase  of  less  than  57  in  the 
numoer  of  instruction  executions.  For  18  of  the  programs  this  increase  is  less  than  27,  but 
in  3 cases  it  runs  as  high  as  207  to  307.  (ALGOL,  FORTEN  Bairstow,  F0RF0R  Bairstow). 

The  situation  changes  somewhat  when  we  consider  the  need  of  the  whole  subject  set.  Based 
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FIGURE  5-3 


The  modified  Gibson  classification. 

Percentage  of  executed  instructions  in  the  Gibson  classes. 
Percentage  of  time  included  for  our  subject  set. 
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1.0 

0.6 

Miscellaneous 

5.3 

0.0 

1.5 

1.7 

Indexing 

18.0 

13.4 

- 

Fullword 

- 

6.9 

“ 

I/O  control 

- 

0.0 

0.1 

0.0 

Inter  reg.  transfer 

- 

5.0 

“ 

Monitor  communic. 

- 

- 

0.0 

0.0 

User  UUOs 

- 

• 

0.3 

c.o 

The  classes  are  not  equally  applicable  to  all  ISPs,  as  indicated  by  dashes.  This  applies  in 
particular  to  index  register  instructions. 


In  Gibsons  original  classification,  use  of  indexing  was  counted  as  an  extra  instruction  in  the 
"Indexing"  class;  the  "Compare"  class  consisted  of  the  3 way  skips  in  the  704. 

In  the  UMASS  version  of  the  Gibson  classification,  the  "Compares"  class  consists  of  all  the 
vector  search  operations,  "Indexing"  is  all  the  index  register  instructions,  Fullword  is  all  the 
48  bit  instructions.  The  "Inter  register  transfer"  class  also  includes  other  instructions  that 
only  manipulate  processor  state. 

Gibsons  results  were  obtained  using  mostly  scientific  programs,  but  some  business  data 
processing  programs,  coded  in  unspecified  languages. 

The  UMASS  results  were  obtained  using  assembly  and  FORTRAN  coded  programs,  including 
1 e FORTRAN  compiler  and  the  assembler. 
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FIGURE  5-4 


The  program  structure  distribution,  part  1. 

Percentage  of  instruction  executions  in  each  class 
for  the  total  subject  set  and  its  subsets. 


Class 

Compilers 

Nonnumeric 

Numeric 

Total 

Word  to  acc. 

10.5 

24.2 

19.7 

20.1 

Word  to  memory 

4.6 

9.4 

7.2 

7.6 

Immediate  to  acc. 

3.4 

4.5 

4.1 

4.1 

Set  to  acc. 

1.3 

0.4 

0.3 

0.4 

Set  to  memory 

1.2 

0.2 

0.5 

0.5 

Partword  to  acc. 

10.8 

4.0 

3.2 

4.4 

Acc.  to  partword 

2.4 

0.5 

0.7 

0.9 

Block  move 

0.2 

0.0 

0.1 

0.0 

Set  bits 

0.9 

0.6 

0.8 

0.7 

Add  or  sub.  1 

1.6 

1.8 

1.6 

1.7 

Fixp.  add  sub. 

5.3 

14.5 

9.7 

10.8 

Fixp.  mul.  div. 

0.4 

1.2 

2.1 

1.6 

Floating  arith. 

0.0 

1.4 

15.1 

8.6 

Shifts 

1.0 

4.6 

4.1 

3.9 

Logic 

2.1 

0.7 

0.9 

1.0 

I/O  transfer 

0.0 

0.1 

0.1 

0.1 

I/O  administr. 

0.0 

0.0 

0.0 

0.0 

Other  monitor  comm. 

0.0 

0.0 

0.0 

0.0 

User  UUO 

0 

0.5 

0.3 

0.3 

Subr.  jumps 

5.1 

2.5 

2.7 

2.9 

Subr.  returns 

3.9 

2.2 

2.2 

2.4 

Stackptr.  manip. 

5.5 

3.3 

4.9 

4.4 

Test  acc.  vs.  immediate 

7.7 

1.7 

1.0 

2.1 

Test  acc.  vs.  0 

2.5 

1.8 

2.1 

2.0 

Test  acc.  vs.  memory 

3.0 

4.9 

4.5 

4.5 

Test  memory  vs.  0 

2.3 

1.7 

0.9 

1.3 

Bit  tests 

7.4 

1.2 

1.4 

2.0 

Status  tests 

0.1 

0.0 

0.4 

0.2 

Loop  jumps 

3.9 

3.3 

3.6 

3.6 

Uncond.  jumps 

12.7 

8.2 

5.8 

7.4 

No-ops 

0.0 

0.0 

0.0 

0.0 

Executes 

0.3 

0.8 

0.4 

0.5 

Miscellaneous 

0.2 

0.0 

0.0 

0.0 

The  "Set  to  acc."  and  "Set  to  mem."  classes  load  their  destination  with  all  zeroes  or  all  ones. 
The  "Set  bits"  group  set  individual  bits  in  a word. 
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The  program  structure  distribution,  part  2. 

Percentage  of  computed  time  in  each  class 
for  the  total  subject  set  and  its  subsets. 


Compilers  Nonnumeric  Numeric 


Word  to  acc. 
Word  to  memory 
Immediate  to  acc. 
Set  to  acc. 

Set  to  memory 
Partword  to  acc. 
Acc.  to  partword 
Block  move 


Add  or  sub.  1 
Fixp.  add  sub. 

Fixp.  mul.  div. 

Floating  arith. 

Shifts 

Logic 

I/O  transfer 
I/O  administr. 

Other  monitor  comm. 
User  UUO 
Subr.  jumps 
Subr.  returns 
Stackptr.  manip. 

Test  acc.  vs.  immediate 
Test  acc.  vs.  0 
Test  acc.  vs.  memory 
Test  memory  vs.  0 
Bit  tests 
Status  tests 
Loop  jumps 
Uncond.  jumps 


Executes 

Miscellaneou 
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FIGURE  5-5 


Information  theoretical  measure  of  opcode  utilization. 
Computed  based  on  instruction  count  (IC)  and  computed  time  (CT) 
Theoretical  maximum  (all  opcodes  equally  probable)  is  8.7245 


Algorithm\language 

Bairstow 


Crout 


Treesort 


PERT 


HSvie 


Secant 


Algorithm\Programmer 

Aitken 


Source  progr.\Compiler 
Treesort 


Total  subject  set: 
Compiler  set: 
Numeric  set: 
Nonnumeric  set: 


ALGOL 

BASIC 

BLISS 

IC 

4.64 

4.49 

4.85 

CT 

4.52 

4.63 

4.65 

IC 

5.10 

4.44 

3.75 

CT 

5.15 

4.51 

3.67 

IC 

3.21 

4.40 

3.17 

CT 

3.03 

4.51 

3.16 

IC 

4.91 

4.39 

3.93 

CT 

4.89 

4.46 

3.98 

IC 

5.46 

4.89 

4.94 

CT 

5.36 

4.85 

4.66 

IC 

5.19 

- 

3.88 

CT 

5.19 

- 

3.77 

IC 

- 

- 

- 

CT 

- 

E 

B 

A 

IC 

4.26 

4.27 

4.09 

CT 

4.02 

3.97 

4.12 

ALGOL 

BASIC 

BLISS 

IC 

5.44 

5.37 

4.84 

CT 

5.48 

5.20 

4.73 

FORFOR 

5.38 

5.00 


FORTEN 

5.37 

4.83 


FORFOR 

5.20 

5.29 


FORTEN 

5.01 

5.08 
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FIGURE  5-6 


The  extended  FGR  function. 

Relative  increase  in  instruction  count  by  reducing  the  instruction 
set  to  64,  48  or  32  instructions  using  a recoding  factor  of  4. 


Ai<>orithm\!anguage 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Bairstow 

64 

0.092 

0.021 

0.043 

0.294 

0.2F8 

48 

0.225 

0.042 

0.140 

0.496 

0.461 

32 

0.433 

0.094 

0.360 

0.792 

0.755 

Crout 

64 

0.022 

0.006 

0 

0.006 

0.005 

48 

0.134 

0.016 

0.001 

0.032 

0.017 

32 

0.447 

0.081 

0.023 

0.174 

0.093 

Treesort 

64 

0.003 

0.001 

0 

0 

0.000 

48 

0.006 

0.004 

0 

r.ooo 

0.002 

32 

0.026 

0.018 

0.000 

0.003 

0.007 

PERT 

64 

0.027 

0.004 

0 

0.042 

0.051 

48 

0.184 

0.019 

0.012 

0.081 

0 103 

32 

0 249 

0.069 

0.098 

0.167 

0.203 

Havie 

64 

0.018 

0.024 

0.029 

0.059 

0.077 

48 

0.222 

0.060 

0.010 

0.115 

0.128 

32 

0.750 

0.454 

0.235 

0.216 

0.224 

Ising 

64 

0.020 

- 

0 

0.035 

0.078 

48 

0.100 

- 

0 

0.073 

0.163 

32 

0.476 

- 

0.041 

0.157 

0.288 

Secant 

64 

_ 

- 

- 

0.024 

0.026 

48 

- 

- 

- 

0.060 

0.058 

32 

- 

- 

- 

0.184 

0.160 

Algor  ithm\Programmer 

E 

B 

A 

G 

L 

Ait  Ken 

64 

0 

0 

0 

0 

0 

48 

0.000 

0.000 

0.000 

0.000 

0.000 

32 

0.128 

0.162 

0.109 

0.052 

0.050 

Source  progr.\Compiler 

ALGOL 

BASIC 

BLISS 

FORFOR 

FORTEN 

Treesort 

64 

0.210 

0.036 

0.101 

0.073 

48 

0.406 

0.253 

0.121 

0.273 

0.197 

32 

0.779 

0.565 

0.341 

0.579 

0.463 

128 

64 

48 

32 

Total  subject  set: 

0 

056 

0.422 

0.631 

0.926 

Compiler  set: 

0.019 

0.271 

0.462 

0.807 

Numeric  set: 

0.040 

0.352 

0.574 

0.883 

Nonnumeric  set: 

0.010 

0.199 

0.342 

0.585 
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on  the  SNIFT,  the  total  number  of  instructions  used  is  274.  29  of  these  are  sufficient  to 
account  for  757  of  the  instruction  executions,  133  of  them  cover  997.  of  the  instruction 
executions.  The  increase  in  time  cost  for  recoding  in  a 64  instruction  set  is  42.27..  This 
recoding  cost  is  well  above  the  highest  costs  for  individual  subject  programs.  This  shows 
that  altough  each  individual  program  uses  only  a small  set  of  instructions,  this  set  is  not  the 
same  for  all  the  programs.  Recoding  into  an  128  instruction  set  would  increase  the  time  by 


The  results  vary  systematically  with  algorithm  and  language.  BLISS  programs  generally  use 
fewest  opcode*.',  and  have  the  lowest  recoding  cost.  This  may  in  part  be  due  to  the  total  lack 
of  run  time  system  in  BLISS  (no  I/O  initialization  or  timing  unless  explisitly  requested).  BLISS 
programs  are  also  as  fast  as,  or  faster  than,  the  other  programs  for  the  same  algorithm. 
Except  for  Bairstow,  ALGOL  programs  have  the  highest  recoding  cost  for  a 32  instruction  set 
but  the  FORTRAN  programs,  except  for  Crout,  are  the  most  expensive  to  recode  in  a 64 
instruction  set.  The  recoding  cost  of  SEC  is  comparatively  low,  whereas  it  is  consistently 
high  for  the  compilers,  though  not  higher  than  for  several  of  the  short  programs.  Treesort 
has  the  lowest  recoding  cost  in  all  languages,  Bairstow  has  the  highest,  except  in  BASIC. 
Hence  there  seems  to  be  a correlation  between  the  recoding  cost  and  the  size  and 
complexity  of  the  program.  This  is  as  one  would  expect.  The  difference  between  the  results 
from  the  two  FORTRAN  versions  seems  significantly  less  than  the  difference  between  the 
results  for  the  different  languages. 

When  removing  an  instruction  from  an  existing  ISP,  one  should  not  only  consider  its 
frequency  of  usage,  but  also  the  ease  of  coding  it  in  the  remaining  instruction  set,  and  the 
degree  of  system  in  the  allocation  of  opcodes.  A break  in  such  a system  may  cause 
increased  programming  cost.  This  is  particularly  true  for  the  PDP-10,  which  has  a very 
systematic  instruction  set. 

The  restricted  selection  of  our  subject  set,  and  our  use  of  SNIFTs  instead  of  SWIFTs,  casts 
some  doubt  on  our  conclusions  about  the  necessity  of  individual  instructions  in  the  PDP-10. 
In  particular,  since  all  programs  weigh  equally,  instructions  used  in  special  contexts  in  one  of 
the  small  programs  will  get  high  representations  in  the  SNIFT.  Furthermore,  the  omission  of 
I/O  from  the  small  algorithms  leaves  a timeconsuming  and  specialized  aspect  of  most 
programs  uninvestigated.  We  do,  however,  give  some  indications  based  on  the  SNIFT,  which 
intuitively  seem  relatively  independent  of  these  deficiencies. 
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Large  sections  of  the  logic  instructions  (only  6 out  of  64  are  used  significantly),  the  bit  test 
instructions  (9  of  64)  and  the  halfword  instructions  could  be  removed.  The  systematic 
allocation  of  opcodes  would  not  be  unduly  broken,  and  few  instructions  would  need 
interpretation.  There  are  also  unused  sections  of  the  loop  control  group  and  the  arithmetic 
group. 

The  UUOs  are  particularly  little  used.  Their  number  could  probably  be  reduced  to  7 (3  user 
+ 4 monitor)  or  15  (3+12)  by  encoding  information  about  function  in  the  address  field  or  in  a 
control  block.  UUOs  are  further  discussed  in  Section  6.1,  where  the  time  cost  of  using 
them  is  shown  to  be  high  relative  to  using  routine  call  instructions. 

Finally  there  are  many  no-ops  and  duplicate  instructions.  Removal  of  these  would,  however, 
break  the  systematic  allocation  of  operations. 

These  remarks  indicate  that  these  results  depend  more  on  the  algorithms  than  did  those  for 
registers.  Hence  a subject  set  should  be  chosen  to  cover  the  application  area  in  the  widest 
possible  way.  It  should  further  contain  as  wide  as  possible  a range  of  programming 
constructs.  Commonly  used  languages  should  also  be  well  represented.  Finally  on#  should 
not  put  too  much  significance  into  the  results  from  one  or  a few  analyses,  particularly  not 
from  a small  program. 

We  finally  point  out  that  the  Gitson  and  program  structure  distributions  (Figure  5-3  and 
Figure  5-4)  indicate  that  there  is  also  a great  deal  of  commonality  between  the  results  from 
the  different  programs,  and  also  between  different  ISPs. 


5.2  Collection  of  instruction  sequences 

We  now  turn  to  the  problem  of  detecting  data  types  and  operators  that  might  be  added  to 
the  ISP  with  benefit,  and  which  represent  data  operations  genuinely  different  from  the 
existing  ones.  As  previously  noted,  one  way  of  detecting  such  operators  may  be  by 
observing  frequently  occurring  sequences  of  instructions,  viz.  those  sequences  used  to 
perform  the  data  operations,  representing  encodings  of  the  missing  instructions  in  terms  of 
the  existing  instruction  set. 
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5.2.1  The  program 

We  first  describe  our  method  for  detecting  frequently  occurring  sequences  of  instructions. 
The  major  problems  are  due  to  the  need  for  space  and  time  efficiency  in  the  analysis 
program.  This  is  clearly  demonstrated  by  a glance  at  the  intermediate  results  of  a large 
analysis*:  1600  different  pairs  were  found  by  our  program**.  If  'ill  of  these  were  to  be 
extended  to  triples,  quadruples  etc.,  data  space  and  processing  time  requirements  would  soon 
become  prohibitive.  Hence  some  methods  are  needed  to  detect  and  omit  insignificant 
sequences. 

The  data  structure  where  the  information  is  collected  is  essentially  a forest  of  binary  trees 
[KnuD69],  each  node  represents  a sequence,  and  each  root  corresponds  to  the  first 
instruction  of  the  sequences  represented  in  its  tree.  By  a level  (or  level  L)  we  mean  all 
nodes  representing  sequences  of  a given  length  L.  The  leader  of  a sequence  of  length  L is 
the  L-l  first  instructions  in  it.  Its  trailer  is  its  L-l  last  instructions.  The  descendants  of  each 
node  are: 

a)  The  extension,  i.e.  the  first  of  the  nodes  on  the  next  higher  level,  representing  an 

extension  of  the  sequence  represented  by  this  node. 

b)  The  next,  i.e.  the  next  node  on  the  same  level  having  the  same  leader. 

To  facilitate  pruning,  as  described  below,  we  also  chain  all  nodes  on  the  same  level,  and  in 
order  that  we  may  reconstruct  the  sequence  represented  by  a node,  each  node  has  a back 
pointer  to  the  node  representing  its  leader.  Finally  each  node  contains  the  last  opcode  of 
the  sequence  it  represents,  the  occurrence  count  for  that  sequence,  and  its  length  (i.e.  the 
level  number  of  the  node). 

For  efficiency  reasons  we  do  not  pack  the  nodes,  hence  7 words  are  needed  for  each***. 
2000  nodes  were  sufficient  for  the  analysis  of  all  the  subject  programs  except  FORTEN. 
About  2100  nodes  were  needed  for  the  first  pass  of  that  analysis,  the  1600  mentioned  on 
page  103  plus  512  for  level  1. 


* FORTEN,  295  000  instructions  traced. 

**  Which  were  reduced  to  61  after  applying  the  pruning  methods  to  be  described. 

***  Easily  reduced  to  4 words  per  node  if  using  a language  that  makes  the  halfword  load  and 
s*ore  instructions  available. 
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To  keep  the  forest  of  limited  acreage,  we  use  a multi  pass  algorithm.  The  first  pass 
accumulates  the  pairs,  each  subsequent  pass  extends  the  sequences  by  one  instruction,  thus 
adding  one  level  to  the  forest.  After  each  pass  the  forest  is  pruned.  The  pruning  not  only 
discards  insignificant  sequences,  but  also  attempts  to  recognize  closed  loops,  several 
representations  of  the  same  sequence,  etc.  If  significant  sequences  remain  after  pruning,  a 
new  pass  will  be  performed. 

This  continues  until  either  all  sequences  on  the  top  level  are  pruned  or  until  a predetermined 
level  (read  as  data)  is  reached.  In  the  latter  case,  the  user  of  the  program  may  decide  after 
each  pass  whether  to  continue.  His  decision  is  based  on  a few  simple  data  typed  as  each 
pass  is  completed.  Furthermore  the  current  version  of  our  program  saves  status  after  each 
pass  and  is  easily  restarted  if  inspection  of  the  output  indicates  that  longer  sequences  would 
be  of  interest,  or  in  case  of  machine  breakdown. 

Maximal  program  capacity  is  sequences  of  length  20.  This  limit  was  arbitrarily  set  since  we 
believed  that  seqjences  of  this  length  neither  would  be  found,  nor  would  be  of  interest.  This 
turned  out  to  be  only  partly  true.  Using  the  pruning  algorithm  outlined  below,  and  cutting 
each  tree  at  the  root  when  all  its  nodes  at  the  top  level  are  deleted,  the  algorithm  is  not 
prohibitively  expensive*.  Hence  in  the  experiments  we  used  a typed  In  limit  of  20.  About 
half  of  the  analyses  reached  this  level,  all  of  them  reached  level  10. 

After  about  the  tenth  pass  of  our  algorithm  very  few  sequences  remain,  hence  each  could 
probably  be  extended  by  5 or  more  in  each  pass  without  undue  consumption  of  space.  Thi; 
would  make  the  method  significantly  faster,  and  permit  the  analysis  to  run  until  all  sequences 
terminated  "naturally".  It  would,  however,  require  some  reprogramming 

At  the  end  of  the  run  the  counts  of  shorter  sequences  are  reduced  to  account  for  the 
extension  of  these  sequences  into  longer  significant  sequences.  That  is:  starting  at  the  top 
level  we  visit  each  sequence  in  turn:  and  generate  all  its  subsequences.  For  each  such 
subsequence  we  reduce  its  count  by  the  count  of  the  main  sequence.  Hence  the  final  count 
for  each  sequence  reflects  the  unextendable  fraction  of  the  total  numbe-  of  occurrences  of 
this  sequence.  The  computed  time  for  each  occurence  of  the  sequence  is  easily  obtained,  as 
are  the  fractions  of  the  total  instruction  count  and  computed  time  consumed  by  all 
occurrences  of  the  sequence. 

* With  approximately  100  0C0  instructions  traced,  (subject  program  FORTEN  Treesort),  the 
run  time  was  approximately  35  min.  for  sequences  of  length  up  to  20.  Probably  this  could  be 
reduced  considerably  by  coding  the  tree  lookup  routine  in  assembly  code. 
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5.2.2  The  pruning  heuristics 

The  results  presented  in  Section  5.3  were  obtained  using  the  following  pruning  algorithm: 
After  each  level  is  built,  each  of  the  new  nodes  is  examined  in  turn  and  the  heuristics  about 
to  be  described  are  applied  to  it.  Since  some  of  the  heuristics  involve  more  nodes  than  the 
one  thus  examined,  no  nodes  are  deleted  until  a second  pass  down  the  level  chain.  The  first 
pass  merely  marks  the  nodes  to  be  deleted,  using  the  extension  field  which  is  otherwise 
unused  it  the  top  level. 

In  the  examples  below,  A,  B,  ...  denote  instructions,  J denotes  a jump  instruction.  A sequence 
and  its  count  (the  latter  often  omitted)  are  given  as:  <A  B C D E:  647>. 

] 

Rule  K: 

All  sequences  whose  count  is  less  than  107.  of  the  maximum  count  at  the  same  level  are 

I 

marked  for  deletion. 

Heuristic  0: 

All  sequences  that  are  not  a "significant"  extension  of  their  leader  or  trailer  are  marked 
for  deletion.  Exceptions  are  made  for  sequences  of  all  the  same  instruction  and  for 
sequences  whose  count  is  at  least  1/50  of  the  number  of  instructions  in  the  subject 
program.  The  meaning  of  "significant"  depends  on  the  level.  A factor  is  defined  by  the 
following  table: 

Level:  2 3 4 >4 

Factor:  1/8  1/4  1/2  3/4 

All  sequences  whose  count  is  not  at  least  factor  times  the  count  of  both  its  leader  and  its 
trailer  are  marked.  (If  the  trailer  does  not  exist,  its  count  is  taken  to  be  0).  The  intent 
of  this  heuristic  is  to  isolate  the  common  part  of  partly  overlapping  sequences  as  the 
more  important.  Given  the  sequences  <A  B C:  500>,  <B  C D:  150>,  <C  D E:  150>  and 
<D  E F:  800>,  <B  C D>  would  not  be  marked,  but  <C  D E>  would  be. 

Heurisv  c.  1: 

The  intent  of  this  heuristic  is  to  detect  loops.  It  is  applied  at  levels  i.  4.  It  is  first 
checked  whether  the  first  and  last  pairs  of  instructions  in  the  sequence  are  the  same.  If 
so,  it  is  checked  whether  the  sequence  contains  a jump  instruction.  If  so,  we  assume  we 
have  found  a loop  of  length  2 less  than  the  present  level.  Finally  it  is  checked  if  the 

i* 

k 


same  loop  is  represented  elsewhere  in  the  forest*.  Whenever  such  a representation  is 
detected,  it  is  marked  for  removal.  Thus  <A  B C D E F A B>  and  <A  B C D J E F G>  are 
not  loops  by  this  heuristic,  but  <A  B C D J E A B>  is  a loop. 


r 

I 

l 

f 
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F 


I 
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Heuristic  2: 

This  heuristic  is  applied  at  levels  > 4.  It  attempts  to  detect  if  there  are  several  nodes 
representing  subsequences  of  the  same  longer  sequence  yet  to  be  built.  As  the  top  level 
nodes  are  examined,  chains  are  built  linking  nodes  that  are  believed  to  represent  such 
sequences.  Let 

SI  - <C  D E . . . F G> 

be  the  sequence  of  length  L that  is  currently  under  examination.  We  ,'Ow  examine  all 
sequences  of  form: 

<X  C D E . . . F > 

for  some  X.  Let  S2  be  one  of  these.  S2  and  its  chain  become  the  chain  of  SI  if: 

a)  Ther  count  differ  by  at  most  3. 

b)  SI  was  not  in  this  chain  before. 

a)  will  ensure  that  the  sequences  are  equally  significant;  b)  that  we  do  not  delete  all 
representations  of  a loop.  Note  that  SI  occurred  later  in  the  instruction  stream  than  S2, 
but  is  before  it  in  the  chain.  Hence  the  sequence  occurring  earliest  in  the  instruction 
stream  is  the  one  which  will  have  a null  link,  and  trerefore  be  kept.  Thus  for  the 
sequences  of  <ABCD  E>,  <BC0E  F>  and  <CDEF  G>,  the  chain  would  go  from 
<C  D E F G>  to  <B  C D E F>  to  <A  B C D E>,  and  the  latter  would  be  kept.  In  the  previous 
notation,  if  the  chain  consisted  of  Si  and  S2,  Si  would  be  deleted. 

Heuristic  3: 

This  heuristic  is  applied  at  levels  > 6,  and  is  designed  to  detect  and  mark  all  but  the  most 
frequent  of  those  sequences  at  the  level  which  overlap  by  a significant  number  of 
instructions,  - at  least  2/3  of  the  level  number.  For  each  sequence  at  level  L > 6 ( say 
<A  8 C D E F G H>),  we  consider  all  extensions  of  its  trailer  to  the  level  of  L (such  as 
<B  C D E F G H I>),  and  delete  all  but  the  one  with  the  largest  count.  We  then  repeat  the 
process  for  the  trailer  of  the  trailer  (i.e.  <C  D E F G H>).  extending  to  level  L again  and 
so  on  until  we  have  reached  the  least  overlap  permitted. 

Each  of  these  heuristics  is  programmed  as  a routine,  and  called  from  one  place  in  ‘he 


* A loop  of  length  L may  be  represented  at  L places  in  level  L+2,  each  starting  with  a 
different  instruction  of  the  loop. 
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program,  inside  a pruning  control  routine.  Hence  it  is  easy  to  change  the  heuristics  and  the 
order  in  which  they  are  applied,  or  to  add  new  heuristics. 


5.2.3  Sources  of  errors 

Thure  are  some  problems  associated  with  this  method.  Some  of  these  could  be  avoided  by 
adjusting  the  parameters  to  the  heuristics,  but  this  is  not  sufficient.  We  now  present  the 
most  significant  of  these  problems,  and  propose  some  remedies. 


Sequence  overlap 

Because  of  the  heuristic  nature  of  the  pruning  algorithm,  we  have  no  guarantee  that  the 
sequences  at  any  level  are  really  disjoint.  Hence  the  final  reduced  counts  are  not  completely 
reliable.  In  particular  the  counts  for  subsequences  common  to  two  overlapping  longer 
sequences  will  be  too  low.  This  is  clearly  seen  in  all  programs  analyzed,  several  examples 
are  shown  in  Section  5.3. 

To  remove  this  problem,  the  heuristics  for  detecting  overlaps  must  be  improved.  At  first 
sight,  the  obvious  way  is  to  shift  each  sequence  completely  out  of  the  sequence  detection 
mechanism  once  it  has  been  recorded,  rather  than  trying  to  detect  new  sequences  starting 
with  instructions  in  its  trailer.  This  assumes,  however,  that  the  sequence  just  recorded  is 
more  significant  than  those  omitted  as  a consequence  of  the  shift.  Hence  this  technique  can 
not  be  used  at  low  levels,  since  that  would  prevent  us  from  detecting  which  sequences  are 
significant  in  the  first  place.  Changing  to  this  technique  at  a higher  level  requires  great  care 
lest  we  extend  the  wrong  sequences  of  those  now  overlapping.  Hence  we  reject  this 
approach,  and  we  believe  the  way  to  go  must  be  to  improve  our  present  heuristics  and  the 
way  they  interact,  and  device  new  heuristics  in  the  same  spirit. 

We  believe  that  not  even  the  best  of  heuristics  can  completely  avoid  this  problem.  Hence  we 
suggest  two  more  ways  to  relieve  it.  Firstly,  the  counts  at  each  level  may  be  printed  after 
the  level  is  built,  immmediately  before  pruning,  as  well  as  at  the  end  of  the  analysis.  These 
original  counts  may  then  be  compared  with  the  final  reduced  counts.  We  did  this,  and  found 
it  a help  in  detecting  significant  sequences  in  general  during  the  manual  analysis  described  in 
Section  5.3.  In  Section  5.3  we  present  both  original  and  reduced  results. 
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Secondly,  one  may  decide  from  one  run  as  outlined  above,  which  sequences  are  important 
enough  or  which  results  are  wrong  enough  that  exact  counts  are  desirable.  A second  run 
can  then  be  done,  with  a slightly  different  program,  collecting  statistics  on  these  sequences 
only.  This  can  be  done  in  one  pass  since  we  Know  what  to  look  for.  Such  a program  should 
be  written  to  look  for  classes  of  sequences  as,  for  instance,  variants  of  a calling  sequence, 
possibly  defined  by  a regular  expression.  We  wrote  nc  program  for  this. 


Dominating  loops 

Another  problem  is  that  of  dominjting  loops.  Our  program  tends  to  find  long  sequences, 
sometimes  representing  whole  loops  of  the  subject  program,  rather  than  the  shorter 
sequences  that  are  more  frequent  and  which  could  reasonably  be  implemented  as 
instructions.  This  is  particularly  true  for  the  short  subject  programs,  where  one  or  a few 
loops  dominate  the  results.  The  situation  is  improved  when  subject  programs  of  a more 
representative  length  and  complexity  are  analyzed.  Further  improvement  can  most  probably 
be  achieved  by  strengthening  the  definition  of  "significant"  in  heuristic  0.  This  can  be  done 
either  by  increasing  the  "factor",  particularly  for  the  higher  levels,  or  we  may  introduce  new 
criteria  of  "significance".  One  such  could  be  to  compare  the  total  time  consumed  by  the 
sequences  in  question  rather  than  their  occurrence  counts.  Again  a factor  could  be  used  in  a 
way  similar  to  the  present  one. 


Interacting  heuristics 

A third  problem  is  the  interaction  of  the  heuristics,  particularly  heuristics  1 (loops)  and  2 
(subsequences  of  longer  sequences).  Probably  the  loop  heuristic  should  be  applied  last,  after 
all  deletions  resulting  from  the  other  heuristics  have  been  performed. 

Semantics  of  sequences 

Finally  there  is  the  problem  of  relating  the  sequences  back  to  the  subject  program  in 
question.  This  may  be  difficult  because  the  semantics  of  the  sequences  is  not  always 
obvious,  and  can  only  be  found  after  a careful  and  time  consuming  study  of  well  commented 
source  and  assembly  listings.  Also,  the  sequences  found  may  not  relate  easily  to  intuitively 
meaningful  notions.  This  is  related  to  the  problem  of  dominating  loops.  The  double  length 
arithmetic  of  Crout  is  a case  in  point.  This  occurs  in  a context  such  as 
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for  kx  «-  low  step  1 until  high  dfi.  sum  «-  sum+A[lx,kx]*B[kx]; 
where  sum  is  the  double  length  variable.  The  double  length  addition  is  easily  spotted  by  the 
occurrence  of  the  UFAf  instruction,  but  it  is  embedded  in  a sequence  of  length  20  which  also 
involves  array  accessing  and  the  enclosing  loop. 

More  intuitive  program  elements  can  be  brought  out  by. 

Looking  for  more  specific  sequences  as  indicated  above. 

Improving  the  heuristics,  possibly  to  start  and  break  sequences  at  jumps  more  easily  than 
now.  However,  an  advantage  of  our  present  method  is  that  it  permits  detection  of 
significant  sequences,  crossing  transfers  of  control,  that  might  not  have  been  suspected 
to  be  of  importance11.  This  property  should  not  be  lost. 

Generate  sequences  longer  than  20,  and  try  to  keep  the  "earliest"  one  as  described 
under  heuristic  2. 


5.3  Results  from  the  sequence  program 

Each  result  produced  by  our  program  consists  of  a sequence  of  operation  cndes,  together 
with  its  occurrence  count  and  timing  data  computed  from  this  count.  Hence  the  results  need 
quite  a bit  of  manual  analysis  to  yield  useful  data.  This  analysis  involves  comparing  with 
assembly  listings  (possibly  using  interactive  debugging  systems  to  locate  sequences), 
comparing  counts  obtained  before  and  atter  reduction  or  on  different  lovolt,  etc.  Good 
knowledge  of  the  subject  program  in  question  is  an  obvious  advantage. 

The  deficiencies  of  our  pruning  heuristics  and  the  way  they  interact,  as  described  in  Section 
5.2.3,  increase  the  difficulty  of  this  analysis.  We  have,  however,  made  an  attempt,  and 
present  the  results  below.  Due  to  the  manual  processing,  the  selection  of  sequences 
presented  is  necessarily  subjective. 


♦ Unnormalized  floating  add 

♦♦  The  BLISS  calling  sequences,  the  array  access  and  UUO  handling  in  BASIC  programs,  and 
the  thunk  of  ALGOL  PERT  are  examples  of  this. 
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The  results  are  presented  by  algorithm.  The  characteristics  of  each  algorithm,  as  described 
in  Figure  3-2,  rarely  occur  frequently  enough  to  show  up,  but  when  they  do  we  comment  on 
it.  For  each  program,  the  maximal  sequence  length  reached  during  analysis  is  given.  In  some 
cases  all  sequences  on  Hie  highest  level  reached  were  deleted  by  the  pruning  mechanism.  In 
those  cases  the  highest  level  with  significant  sequences  was  one  or  two  lower  than  the 
highest  level  reached,  as  is  indicated  in  parentheses.  In  some  cases  the  sequences  at  the  top 
levcl(s)  were  rejected  during  the  manual  scan.  This  is  not  explisitly  indicated. 

Since  this  method  of  sequences  is  applicable  to  address  calculation  and  control  structures  as 
well  as  to  data  types  and  their  operators,  we  have  made  no  distinction  between  sequences  of 
these  3 types  in  the  lists  of  sequences.  For  the  same  reason  ve  present  them  with  the  bare 
minimum  of  identifying  comment.  Evaluation  is  postponed  until  later  sections  in  the  relevant 
chapters:  5.4,  6.1  and  7.1.1. 

The  sequences  are  presented  in  a standard  format,  giving  the  occurrence  count  of  the 
sequence,  the  percentage  of  the  totai  computed  time  consumed  by  it,  and  a single  letter  (B  or 
A)  designating  if  the  results  are  from  before  or  after  count  reduction.  This  is  followed  by 
the  sequence  itself.  Several  versions  of  the  same  or  largely  overlapping  sequences  have 
been  included  when  it  seemed  to  be  of  interest,  either  because  of  a much  larger  count  for  a 
subsequence,  because  of  a better  correspondence  with  an  intuitive  program  fragment,  to 
show  the  difference  due  to  count  reduction,  or  to  show  examples  of  bad  pruning.  Since  the 
sequences  overlap,  the  percentages  of  time  sometimes  add  up  to  more  than  100. 

Note  that  an  XCT  instruction  is  immediately  followed  by  its  target  instruction.  User  UUOs* 
are  given  in  numeric  (octal)  form,  followed  by  the  code  for  the  UUO  interpreter,  starting  at 
location  41.  Monitor  UUOs  are  given  in  their  octal  form,  followed  by  the  next  instruction  of 
the  program  itself  (see  Section  1.3). 


t A user  UUO  is  an  instruction  (octal  01  through  37)  which  causes  a trap  to  location  41  in  the 
users  memory.  Since  the  subroutine  thus  called  is  user  defined,  the  UUOs  do  not  have 
common  mnemonic  names.  Monitor  UUOs  (octal  40  through  77)  cause  a trap  to  absolute 
location  41  and  are  used  for  monitor  calls. 
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5.3.1  The  compilers 

Since  these  programs  are  large  and  complex,  and  little  known  to  the  present  author,  the 
analysis  of  them  is  in  some  cases  less  thorough  than  desirable.  This  applies  in  particular  to 
the  two  FORTRAN  compilers.  In  the  other  cases  experts  were  available  for  consultation  and 
the  results  of  th'*  analyst  are  better. 


ALGOL 

Maximal  sequence  length:  11. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

170 

2.9 

A 

JRST 

1LDB 

LDB 

AOS 

MOVE 

MOVE 

CAIN 

XCT 

CAIE 

JRST 

JSP 

(2) 

117 

2.0 

A 

LDB 

MOVEM 

SKIPE 

MOVE 

MOVE 

MOVE 

IBP 

MOVEM 

AOS 

POPJ 

JRST 

(3) 

115 

2.0 

A 

MOVE 

LDB 

M0VE1 

SKIPE 

X0R3 

MOVE 

MOVEM 

IBP 

MOVEM 

PUSHJ 

SKIPE 

(A) 

216 

3.5 

A 

CAME 

AOS 

POPJ 

IMULI 

ADDI 

SOJG 

PUSHJ 

ILDB 

(5) 

295 

2.8 

A 

JRST 

CAIN 

ILDB 

AOS 

MOVE 

JRST 

(6) 

333 

3.5 

B 

AOBJN 

LSHC 

ILDB 

AOS 

SK1PL 

(7) 

541 

5.6 

A 

P'JSHJ 

ILDB 

AOS 

CAME 

POPJ 

(8) 

1641 

9.3 

B 

ILDB 

AOS 

(9) 

176 

2.4 

A 

PUSHJ 

CAME 

ANDI 

JRST 

MOVE 

HRLI 

HRRM 

MOVEM 

MOVE 

MOVEM 

AOS 

(10) 

109 

2.2 

A 

MOVE 
AND  I 

PUSHJ 

IDIVI 

1RNN 

ADDI 

POPJ 

TLNN 

MOVE 

MOVE 

ADDI 

(11) 

1442 

2.5 

B 

TLNE 

JRST 

(12) 

1418 

3.7 

B 

MOVE 

MOVEM 

(13) 

917 

2.7 

B 

AOS 

CAME 

Sequences  (1)  to  (8)  represent  various  forms  of  input  of  characters.  (9)  and  (10)  are 
concerned  with  outputting  relocatable  code.  (11)  shows  the  need  for  test  bit(s)  and  jump, 
(12)  may  be  a memory  to  memory  move,  (13)  is  loop  control. 
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BASIC 

Maximal  sequence  length:  17. 


Seq. 

Count 

7.  Time 

B/A 

S;  quence 

(1) 

1104 

20.7 

A 

SmPE 

ILDB 

JRST 

CAIE 

CAIN 

CAIN 

CAIE 

CA:N 

CAIE 

CAIN 

CA1G 

CAIA 

CAIGE 

IDPB 

SKIPt 

SOSLE 

AOJA 

(2) 

990 

8.0 

A 

ILDB 

CAIN 

IDPB 

JRST 

(3) 

456 

2.9 

A 

ILDB 

HLL 

TRNE 

TLNE 

JRST 

(A) 

402 

2.1 

A 

HLL 

TRNE 

HRL 

TLNE 

POPJ 

(5) 

517 

5.7 

8 

ILDB 

HLL 

TRNE 

TLNE 

POPJ 

(6) 

521 

2.9 

A 

PUSHJ 

ILDB 

HLL 

(7) 

314 

3.3 

A 

MOVE  I 

PUSHJ 

MOVE 

ADD 

CAIE 

SKIPA 

CAMLE 

EXCH 

POPJ 

MOVEM 

(8) 

677 

3.5 

A 

CAIGE 

JRST 

MOVEI 

ADD 

ASH 

CAMLE 

(1)  Is  a loop  to  move  text  lines  from  the  TTY  input  buffer  to  the  BASIC  line  buffer,  character 
by  character.  As  the  line  is  moved  special  characters,  like  VERTICAL  TAB,  LINE  FEED, 
RETURN,  are  removed  or  special  action  is  taken  on  them.  This  loop  could  probably  be 
reduced  to  two  instructions  {ILDB  JRST)  at  the  space  cost  of  a one  word  table  entry  per 
character  in  the  character  set. 

Sequence  (2)  represents  the  loop  that  moves  a line  from  the  line  buffer  into  the  program 
text  area,  stopping  at  a return.  Further  sequences,  (3)  to  (6),  are  associated  with  the  routine 
that  reads  the  next  character,  sets  appropriate  flags  depending  on  its  properties,  and  ignores 
blanks. 

The  main  data  structure  of  BASIC  is  the  roll,  which  essentially  is  a contiguous  but  dynamically 
relocatable  memory  area.  The  compiler  has  a fixed  number  of  rolls,  which  are  packed  to 
conserve  space  and  occasionally  have  to  be  relocated  in  order  to  let  one  of  them  expand. 
The  sequences  (7)  and  (8)  relate  to  this  data  structure.  The  first  of  these  adds  a data  item 
to  the  end  of  a roll,  first  checking  if  there  is  room.  The  second  loop  performs  binary  search 


in  an  ordered  roll. 
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BLISS 

Maximal  sequence  length:  10  (8). 


Seq. 

Count 

% Time 

B/A 

Sequence 

(1) 

15763 

14.3 

A 

PUSH 

PUSHJ 

JSP 

PUSH 

HRRZ 

(2) 

10462 

7.2 

A 

JRST 

POP 

POPJ 

SUB 

(3) 

3724 

3.5 

A 

JRST 

POP 

POP 

POPJ 

SUB 

(4) 

4897 

3.5 

A 

PUSH 

HRR2 

PUSH 

JRST 

(5) 

4489 

3.0 

A 

PUSH 

PUSH 

PUSHJ 

(6) 

3264 

2.4 

A 

PUSH 

PUSH 

PUSH 

(7) 

18275 

12.1 

B 

PUSHJ 

JSP 

PUSH 

HRRZ 

(8) 

12256 

6.9 

B 

JSP 

PUSH 

HRRZ 

JRST 

All  these  represent  the  routine  entry  and  exit  mechanism,  which  probably  accounts  for  at 
least  25%  of  the  compilation  time.  Note  that  these  sequences  have  considerable  overlap,  and 
that  (7)  and  (8)  are  from  before  reduction. 


FORFOR 

Maximal  sequence  length:  10  (8). 


Seq. 

Count 

% Time 

B/A 

Sequence 

(1) 

17484 

11.3 

A 

AOJA 

MOVE 

HLRZ 

TRNN 

JRST 

(2) 

14555 

9.9 

A 

AOJA 

MOVE 

HLRZ 

TRNN 

TRZE 

(3) 

6390 

5.9 

A 

HLRZ 

TRNN 

TRZE 

JUMPN 

TRZE 

AOJA 

MOVE 

(4) 

5750 

7.0 

A 

HLRZ 

CAIN 

ADD 

HRRZM 

HRRZ 

ADD 

HRRZM 

SOJE 

(5) 

4411 

5.1 

A 

PUSHJ 

LDB 

ANDI 

MOVE  I 

HLRZ 

CAIG 

(6) 

5635 

5.0 

A 

SOJGE 

HLRZ 

CAIN 

ADD 

HRRZM 

HRRZ 

(7) 

26907 

5.9 

B 

TRNN 

JRST 

(8) 

38569 

10.8 

B 

HLRZ 

TRNN 

This 

compiler 

is  highly 

interpretive, 

simulating 

a one 

or  few  register 

machine 

on  the  ] 

register  PDP-10.  Sequences  (1)  to  (3)  are  associated  with  the  "instruction  fetch"  cycle  of 
this  interpreted  machine. 


(4)  to  (6)  ai  e associated  with  roll  maintenance.  We  believe  that  a roll  in  FORFOR  is 
approximately  the  same  as  in  BASIC  (see  under  BASIC  above),  but  since  no  FORTRAN  expert 
is  available,  and  the  assembly  listing  is  poorly  commented,  we  have  not  been  able  to  verify 
this. 
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Some  further  short  sequences,  (7)  and  (8),  with  large  counts  and  time  were  spotted  in  the 
output  from  before  count  reduction.  They  clearly  demonstrate  the  need  for  a test  bit  and 
jump  instruction. 


FORTEN 

Maximal  sequence  length:  20, 


Seq. 

Count 

7 Time 

B/A 

Sequence 

(1) 

571 

3.2 

A 

CAIG 

POPJ 

CAIE 

JRST 

CAIE 

JRST 

MOVE 

TRNE 

CAIE 

JRST 

CAIN 

AOS 

CAMG 

JRST 

PUSHJ 

MOVE 

CAMGE 

JRST 

AOS 

MOVEM 

(2) 

949 

3.2 

A 

CAIG 

POPJ 

JUMPE 

MOVE 

TRNN 

JRST 

CAIE 

JRST 

MOVE 

TRNN 

JRST 

SETZ 

POPJ 

(3) 

4960 

4.7 

B 

POP 

POPJ 

(4) 

2532 

4.1 

B 

PUSHJ 

JSP 

PUSH 

HRRZ 

JRST 

(5) 

2403 

4.7 

B 

PUSHJ 

JSP 

PUSH 

HRRZ 

PUSH 

(6) 

1936 

5.5 

A 

PUSHJ 

SOSG 

CAIA 

ILDB 

MOVEI 

CAIG 

POPJ 

(1)  and  (2)  show  the  need  for  good  testing  instructions.  (3)  to  (6)  are  from  the  BLISS  routine 
entry  and  exit  sequences  (FORTEN  is  written  in  BLISS).  From  these  results  it  is  reasonable  to 
assume  that  the  routine  call  administration  consumes  at  least  157.  of  the  time  in  FORTEN.  (6) 
represents  rea  ling  a character  from  input,  with  some  additional  administration. 


5.3.2  SEC 

Most  of  the  sequences  of  this  program  represent  loops  of  considerable  length.  Usually 
several  matrix  accesses  can  be  observed  in  each  loop,  but  these  are  not  brought  out 
separately  after  count  reduction. 


FORFOR  SEC 

Maximal  sequence  length:  20. 


B/A 

A 


Sequence 

CAMGE  AOJA 
MOVE  ADD 
FMPR  FADR 


MOVE  MOVE  I IMUL  MOVE 
FMPR  MOVE  ADD  MOVE 
MOVEM  MOVE  MOVEI  IMUL 


ADD 

ADD 
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(2) 

2340 

5.5 

A 

CAMGE 

ADD 

A0JA 

FMPR 

M0VE1 

MOVE 

IMUL 

ADD 

ADD 

FADRM 

MOVE  MOVE 

(3) 

2987 

3.0 

A 

MOVEM 

CAMGE 

AOJA 

MOVE 

MOVEI 

IMUL 

(4) 

9390 

9.6 

A 

MOVE 

ADD 

MOVE 

ADD 

FMPR 

(5) 

8777 

8.0 

A 

MOVE  I 

IMUL 

MOVE 

ADD 

MOVE 

(6) 

11072 

8.7 

A 

MOVE  I 

1MUL 

ADD 

MOVE 

(7) 

15364 

14.0 

B 

ADD 

MOVE 

ADD 

FMPR 

(8) 

12499 

11.2 

B 

MOVE 

ADD 

FMPR 

MOVE 

(9) 

20181 

15.7 

B 

MOVE 

ADD 

FMPR 

(10) 

2-:i28 

14.9 

B 

MOVEI 

IMUL 

ADD 

(1)  and  (2)  are  loops  as  mentioned,  (3)  to  (5)  are  sections  of  such  loops,  with  loop  control  and 
matrix  access  showing.  The  original  count  for  (2)  was  2980,  and  77  of  the  time  was 
consumed  by  it.  The  original  time  was  15.77.  for  (4),  10.77  for  (5),  12.87  for  (6).  (6)  is  a load 
of  a matrix  element.  (7)  to  (10)  are  original  results.  The  MOVE  ADD  OPERATE  sequence  is 
access  to  formal  vector,  (10)  is  the  matrix  accessing  sequence. 


FORTEN  SEC 

Maximal  sequence  length:  20. 


Seq. 

Count 

7 Time 

B/A 

Sequence 

(1) 

2987 

12.7 

A 

ADD 

MOVN 

FMPR 

MOVE 

FMPR 

FADR 

MOVEM 

MOVEI 

IMUL 

ADD 

ADD 

MOVE 

MOVEM 

ADDI 

AOJL 

MOVEI 

IMUL 

MOVE 

ADD 

ADD 

(2) 

2980 

7.3 

A 

FADRM 

ADDI 

AOJL 

MOVE 

ADD 

MOVE 

ADD 

MOVEI 

IMUL 

ADD 

MOVE 

FMPR 

(3) 

4760 

8.0 

A 

MOVE 

FMPR 

FADR 

MOVEM 

MOVEI 

IMUL 

(4) 

5940 

3.9 

A 

MOVEM 

MOVE 

MOVEM 

MOVE 

MOVEM 

(5) 

21006 

5.4 

B 

MOVE 

MOVEM 

(6) 

11523 

6.1 

A 

MOVE 

ADD 

ADD 

MOVE 

(7) 

10562 

9.0 

A 

MOVEI 

IMUL 

ADD 

MOVE 

(8) 

34831 

20.2 

B 

MOVEI 

IMUL 

(9) 

26790 

19.4 

B 

MOVEI 

IMUL 

ADD 

(10) 

7758 

8.4 

A 

FMPR 

FADRM 

ADDI 

AOJL 

(11) 

5134 

5.7 

A 

MOVE 

FMPR 

FADRM 

ADDI 

(12) 

12337 

10.3 

A 

ADD 

MOVE 

FMPR 

(13) 

22689 

15.7 

B 

MOVE 

FMPR 

OATA  TYPES  AND  OPERATORS 


116 


Sequence  (1)  here  is  obviously  the  same  loop  as  (1)  under  SEC40.  (2)  to  (4)  represent 
similar  structures.  The  latter  may  indicate  the  need  for  a memory  to  memory  move,  as 
illustrated  further  by  (5).  (6)  contains  vector  access.  (7)  is  matrix  element  load.  The 
importance  of  the  matrix  data  structure  is  further  illustrated  by  (8)  and  (9),  from  before 
reduction.  (10)  to  (12)  are  of  doubtful  origin.  (10)  and  (11)  might  represent  some  inner 
product  like  loop,  (12)  consumed  12.87.  of  the  time  using  the  values  from  before  reduction. 
(13)  would  be  considerably  more  efficiently  executed  on  a two  address  design.  The  MOVE 
ADD  OPERATE  sequence  represents  the  use  of  a formal  vector  and  is  present  in  several  of 

the  sequences. 


5.3.3  Aitken 

This  algorithm  consists  of  two  phases,  first  a search  in  the  vector  of  abscissae  to  locate  the 
interval  where  interpolation  is  to  take  place,  then  the  interpolation  itself  which  is  somewhat 
similar  to  successive  calculations  of  two  by  two  determinants,  controlled  by  two  nested  loops. 
Depending  on  implementation  the  local  data  are  a two  dimensional  array  or  some  number  of 
vectors.  Also  some  implementations  work  directly  on  the  parameter  vectors  defining  the 
abscissae  and  ordinates,  others  move  the  values  needed  to  local  vectors  thereby  saving 
accessing  code.  Two  implementations  perform  arithmetic  on  the  values  while  so  moved.  All 
these  variations  show  up  clearly  in  the  results  to  be  presented. 

The  surrounding  program,  which  sets  up  the  vectors  of  function  (logarithm)  values,  and  calls 
AITKEN  with  different  parameters,  does  not  show  up  in  the  results  from  the  most  time 
consuming  implementations  of  Aitken,  but  is  very  conspicuous  in  the  results  from  the  more 
efficient  versions. 


Aitken  - E 

Maximal  sequence  length:  20. 


Count  7.  Time 

B/A 

Sequence 

200  8.2 

A 

FAD 

MOVE 

FAD 

FDV 

MOVE 

FMP 

MOVE 

FMP 

FAD 

FMP 

FAD 

200  1 1.9 

A 

MOVE 

FMP 

MOVE 

FMP 

FAD 

FMP 

FAD 

FMP 

FAD 

MOVE 

FMPR 

JRST 

POP 

POP 

POP 

POP 

POP 

POPJ 

SUB 

MOVEM 
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(3) 

198 

6.1 

A 

CAMG 

JR  ST 

AOJA 

CA1LE 

MOVE 

MOVEM 

JUMPLE 

MOVE 

CAML 

PUSH 

PUSHJ 

JSP 

PUSH 

HRRZ 

PUSH 

PUSH 

PUSH 

PUSH 

JRST 

MOVE 

(4) 

196 

6.7 

A 

MOVEM 

MOVE 

FADRB 

MOVE 

FMPRB 

GAMG 

JRST 

MOVE 

CAMG 

JRST 

AOJA 

CA1LE 

MOVE 

MOVEM 

JUMPLE 

MOVE 

CAML 

PUSH 

PUSH 

JSP 

(5) 

1485 

49.4 

A 

MOVE 

FMPR 

MOVE 

FMPR 

FSBR 

MOVE 

FSBR 

FDVR 

MOVEM 

SOJGE 

(6) 

405 

8.7 

A 

MOVE 

SOJ 

JUMPL 

MOVE 

FMPR 

MOVE 

FMPR 

FSBR 

(7) 

324 

4.7 

A 

MOVEM 

FMPR 

SOJGE 

SOJG 

MOVE 

SOJ 

JUMPL 

MOVE 

(8) 

255 

3.2 

A 

ASH 

CAML 

JRST 

MOVE 

JRST 

MOVE 

AOJ 

GAMG 

MOVE 

ADD 

(9) 

405 

5.4 

A 

MOVE 

MOVEM 

FSBR 

MOVEM 

MOVE 

MOVEM 

AOJ 

AOJ 

SOJGE 

Sequences  (1)  to  (4)  are  from  the  controlling  program,  and  represent  the  internals  of  LOG,  its 
entry  and  exit,  and  the  controlling  loop.  The  two  first  and  the  two  last  overlap.  As  is  seen, 
the  routine  entry  and  exit  sequences  are  dominant,  particularly  the  saving  and  restoring  of 


local  registers.  There  is  also  some  indication  of  use  of  Horners  rule. 


Sequences  (5)  to  (7)  represent  the  determinant  like  loop,  with  the  first  being  the  inner  loop, 
the  next  two  the  outer  loop  and  partly  overlapping  the  inner.  Binary  search  in  the  abscissae 
vector  is  represented  by  (8),  and  vector  move  by  (9).  The  original  result  for  (9)  was  6 A7.  of 
the  computed  time.  Addresses  of  the  vector  elements  are  used  directly  in  the  code,  to  save 
address  calculation. 


Aitken  - B 

Maximal  sequence  length:  14  (12). 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

1485 

53.8 

A 

MOVE 

MOVE 

FSBR 

FSBR 

(2) 

405 

6.9 

A 

MOVE 

FSBR 

SOJ 

: » 

324 

3.4 

A 

MOVEM 

FSBR 

SOJG 

(4) 

630 

6.4 

A 

MOVE 

CAML 

SUB 

(5) 

405 

3.3 

A 

AOJ 

AOJ 

FMPR 

FDVR 

MOVE 

MOVEM 

FSBR 

SOJGE 

FMPR 

FSBR 

JUMPL 

MOVE 

FSBR 

FMPR 

MOVE 

SOJG 

MOVE 

SOJ 

JUMPL 

MOVE 

CAIG 

MOVE 

ADD 

ASH 

MOVE 

SOJGE 

MOVE 

MOVEM 

MOVE 

MOVEM 

DATA  TYPES  AND  OPERATORS 


118 


(6) 

282 

3.9 

A 

POP 

POP 

POP 

POP  pop 

POPJ 

(7) 

444 

3.7 

A 

PUSH 

PUSH 

PUSH 

PUSH 

(8) 

400 

6.4 

A 

FMP 

FAD 

FMP 

FAD 

The 

routine 

uses  the 

addres 

ses  of 

the  formal 

vectors 

directly,  hence 

there  is 

SOl 


accessing  code.  The  determinant  loop,  and  the  partly  overlapping  sequences  from  its 
enclosing  loop  are  almost  as  in  the  E version,  as  seen  in  (1)  to  (3).  The  binary  search  shows 
up  as  (4).  The  vector  move  of  formal  to  local  is  (5),  its  original  time  was  3.87..  Procedure 
entry  and  exit  is  shown  by  (6)  and  (7).  From  the  initialization  we  have  (8),  which  is  Horners 
rule  in  unrounded  arithmetic. 


Aitken  - A 

Maximal  sequence  length:  20. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

1320 

38.4 

A 

CAMLE 

MOVE 

FMPR 

MOVE 

FMPR 

FSBR 

MOVE 

FSBR 

FDVR 

MOVEM 

AOJA 

(2) 

432 

3.3 

A 

MOVE 

AOJ 

MOVE 

SOJ 

js 

LlJ 

> 

i 

CAMLE 

MOVE 

(3) 

288 

11.0 

A 

CAMLE 

JRST 

AOJA 

CAMLE 

MOVE 

AOJ 

MOVE 

SOJ 

MOVEM 

CAMLE 

MOVE 

FMPR 

MOVE 

FMPR 

FSBR 

MOVE 

FSBR 

FDVR 

MOVEM 

AOJA 

(4) 

1920 

8.6 

A 

MOVE 

MOVEM 

AOJA 

CAMLE 

(5) 

261 

5.3 

A 

MOVE 

CAMLE 

SKIPA 

MOVE 

ADD 

ASH 

MOVE 

JRST 

MOVE 

SUB 

CAIG 

MOVE 

ADD 

MOVE 

CAME 

JRST 

MOVE 

ADD 

(6) 

540 

7.4 

A 

MOVE 

SUB 

CAIG 

MOVE 

ADD 

MOVE 

CAME 

JRST 

MOVE 

ADD 

MOVE 

CAMLE 

(7) 

360 

7.2 

A 

CAMLE 

AOJ 

MOVE 

ADD 

MOVE 

MOVEM 

MOVE 

ADD 

MOVE 

MOVEM 

MOVE 

ADD 

MOVE 

FSBR 

MOVEM 

AOJA 

(8) 

282 

3.5 

A 

POP 

POP 

POP 

POP 

POP 

POPJ 

SUB 

(9) 

400 

7.2 

A 

FMP 

FAD 

FMP 

FAD 

(10) 

3433 

7.3 

B 

AOJA 

CAMLE 

(11) 

3078 

7.5 

B 

MOVE 

ADD 

(12) 

2538 

9.1 

B 

MOVE 

ADD 

MOVE 

The  determinant  loop  is  represented  by  (1)  to  (3);  the 

two  latter  represent  the  outer  1 

and  also  overlap  the  first,  which  is  the  inner  loop.  (4)  is  own  to  own  vector  move  in  the 
outer  loop.  From  the  binary  search  we  have  (5)  and  (6).  The  formal  to  local  vector  move  is 
(7).  The  initialization  phase  shows  up  as  routine  exit  and  Horners  rule,  as  shown  by  (8)  and 
(9).  (10)  to  (12)  show  the  original  results  for  loop  control  and  access  to  formal  vectors. 
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Aitken  - G 


Maximal  sequence  length: 

14  (12). 

Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

6336 

41.9 

A 

MOVE 

ADD 

MOVE 

MOV  El 

CAML 

MOVE  MOVEl 

CAMG 

AND 

TRNN 

AOS 

JRST 

(2) 

2970 

17.0 

A 

MOVE 

ADD 

MOVE 

MOVE 

ADD 

FMPR 

(3) 

18837 

23.5 

B 

MOVE 

ADD 

(4) 

11439 

20.9 

B 

MOVE 

ADD 

MOVE 

(5) 

2970 

11.5 

B 

MOVE 

ADD 

FMPR 

(5) 

1971 

3.7 

B 

MOVE 

ADD 

MOVEM 

(7) 

1485 

3.9 

B 

MOVE 

ADD 

FSBR 

The  search  in  the  vector  is  linear,  and  represented  by  (1).  The  determinant  loop  is  not 
represented  significantly  except  for  a short  section  which  occurs  twice  in  the  loop  and  hence 
overrides  the  accumulation  of  longer  sequences.  This  is  (2),  which  represents  multiplication 
of  two  vector  elements.  Other  fractions  of  this  loop  are  present  but  not  significantly.  The 
access  to  a local  vector  is  of  the  format  MOVE,  ADD,  OPERATE.  This  is  shown  in  (3)  to  (7), 
from  before  reduction. 


Aitken  - L 

Maximal  sequence  length:  18. 


Seq. 

Count 

7,  Time 

B/A 

Sequence 

(1) 

1485 

31.0 

A 

MOVE 

SOJ 

IMULI 

ADD 

MOVE 

FSBR 

FMPR 

MOVE 

FSBR 

ADD 

FMPR 

FSBR 

MOVE 

FSBR 

FDVR 

MOVEM 

AOJA 

CAMLE 

(2) 

1485 

17.0 

A 

MOVE 

IMULI 

MOVE 

ADD 

MOVE 

SOJ 

IMULI 

ADD 

MOVE 

FSBR 

FMPR 

(3) 

6264 

40.5 

A 

CAMIE 

MOVE 

ADD 

MOVE 

CAME 

JRST 

MOVE 

ADD 

MOVE 

CAMGE 

JRST 

AOJA 

(4) 

9127 

9.5 

B 

AOJA 

CAMLE 

(5) 

15219 

18.1 

B 

MOVE 

ADD 

(6) 

14247 

24.8 

B 

MOVE 

ADD 

MOVE 

(7) 

1971 

7.1 

B 

MOVE 

IMULI 

MOVE 

ADD 

The  sequences  (1)  and  (2)  represent  the  determinant  loop.  The  vector  search  (linear)  is 
shown  by  (3).  The  original  results  representing  loop  control  and  vector  access  are  shown  in 
sequences  (A)  to  (6).  (7)  represents  access  to  a matrix. 


( 
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5.3.4  The  CALGO  algorithms,  initial  remarks 

Before  presenting  the  result  for  the  CALGO  algorithms,  we  make  some  general  remarks  about 
the  languages  and  their  peculiarities:  For  matrix  access  the  present  ALGOL  implementation 
uses  Iliffe  vectors,  whereas  the  other  systems  use  multiplicative  methods. 

In  ALGOL  programs  a complicated  run  time  system  is  used  to  implement  the  parameter 
mechanism  (call  by  name),  space  allocation  and  block  structure,  and  to  check  the  legality  of 
operations.  This  is  particularly  noticeable  in  routine  calls  and  parameter  access.  The  run 
time  system  sequences  are  easily  detectable  by  the  bit  manipulating  instructions  they 

contain. 

BASIC  uses  a similar  run  time  system.  User  UUOs  are  used  to  call  the  routines  of  this 
system,  this  even  holds  for  routines  to  do  vector  and  matrix  access.  Furthermore  all 
arithmetic  is  in  floating  point,  so  the  indexes  must  be  truncated  to  integers.  The  routine  to 
do  this  also  checks  the  result  against  the  upper  bound.  The  code  to  fetch  and  store  vectors 
is  the  same  except  for  one  MOVEI  at  the  beginning  which  loads  a register  with  a MOVE, 
MOVEM  or  MOVNM  instruction.  This  is  XCT’d  from  that  register  at  the  end  of  the  access 
routine.  The  code  for  matrix  access  overlaps  that  of  vector  access  to  a large  extent. 


5.3.5  Bairstow 


ALGOL  Bairstow 

Maximal  sequence  length:  11  (10). 


Seq. 

Count 

1.  Time 

B/A 

Sequence 

(1) 

345 

9.6 

A 

JRST 

AOS 

CAMLE 

MOVE 

ADD 

MOVE 

MOVE 

ADD 

MOVE 

FMPR 

(2) 

1001 

24.5 

A 

MOVE 

ADD 

MOVE 

FMPR 

FSBR 

MOVE 

ADD 

(3) 

535 

11.7 

A 

MOVE 

ADD 

MOVE 

MOVE 

ADD 

MOVE 

FMPR 

(4) 

516 

6.4 

A 

MOVE 

ADD 

MOVE 

JRST 

AOS 

CAMLE 

(5) 

470 

6.0 

A 

ADD 

MOVEM 

MOVE 

ADD 

MOVE 

MOVE 

(6) 

518 

5.8 

A 

FSBR 

MOVE 

ADD 

MOVEM 

(7) 

3085 

19.5 

B 

MOVE 

ADD 

MOVE 

(8) 

1025 

6.6 

B 

MOVE 

ADD 

MOVEM 
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(9)  4710  20.3  B MOVE  ADD 

(10)  637  3.8  B JRST  AOS  CAMLE 

Sequence  (1)  to  (6)  show  mainly  vector  access  (MOVE  ADD  OPERATE)  and  loop  control  (JRST 
AOS  CAMLE)  with  some  other  operations  intermixed.  The  results  for  the  vector  access  and 
loop  control  before  reduction  are  given  as  (7)  to  (10). 


BASIC  Bairstow 

Maximal  sequence  length:  20. 


Seq. 

Count 

7 Time 

B/A 

Sequence 

(1) 

3488 

35.7 

A 

MOVEI 

MOVE 

HRRZ 

TRNN 

JRST 

PUSHJ 

MOVE 

AOS 

MOVE 

FAD 

TLZ 

CAMGE 

POPJ 

ADD 

ADD 

XCT 

MOVE 

POPJ 

(2) 

1138 

10.2 

A 

MOVEI 

MOVE 

HRRZ 

TRNN 

JRST 

PUSHJ 

MOVE 

AOS 

MOVE 

FAD 

TLZ 

CAMGE 

POPJ 

ADD 

ADD 

XCT 

(3) 

1171 

4.9 

A 

JSR 

JRST 

PUSH 

LDB 

JRST 

JRST 

(4) 

4626 

9.7 

B 

MOVE 

FAD 

TLZ 

Sequence  (1)  gives  all  of  the  code  for  vector  fetch,  except  the  initial  MOVEI.  (2)  gi /es  the 
same  for  vector  store,  but  truncated  at  the  XCT  instruction.  The  coums  are  correct,  as  can 
be  checked  against  the  count  for  the  appropriate  UUOs.  (3)  is  the  general  UUO  handler.  Its 
original  count  was  4659,  representing  19.57.  of  the  time.  (4)  represents  the  conversion  of 
indices  to  fixed  point. 


BLISS  Bairstow 

Maximal  sequence  length:  20. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

90 

5.1 

A 

TRNN 

JRST 

SKIPE 

PUSH 

PUSHJ 

JSP 

PUSH 

HRRZ 

JRST 

051 

SETZ 

JRST 

POP 

POPJ 

SUB 

JRST 

MOVEI 

SUB 

JRST 

POP 

(2) 

452 

22.4 

A 

MOVE 

MOVEM 

FMPR 

MOVE 

FSBR 

MOVE 

FMPR 

FSBR 

(3) 

370 

9.1 

A 

MOVE 

FMPR 

FADR 

MOVEM 

(4) 

329 

7.8 

A 

MOVEM 

AOJA 

CAMLE 

MOVE 

FMPR 

(5) 

263 

6.6 

A 

FSBR 

MOVEM 

MOVE 

FMPR 

(6) 

263 

6.6 

A 

FMPR 

FSBR 

MOVEM 

MOVE 

(7) 

276 

5.3 

B 

PUSH 

PUSHJ 

JSP 

PUSH 

HRRZ 

JRST 

(8) 

376 

4.4 

B 

POP 

POPJ 

SUB 
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(9)  819  4.3  B AOJA  CAMLE 

Sequence  (1)  and  several  overlapping  sequences  not  listed  represent  output  to  TTY.  (2)  is 
synthetic  division  of  a polynomial  with  a quadratic  term.  (3)  is  an  expression  of  form  D[j]  «- 
D[j]+R*D[j-l].  (4)  to  (6)  are  various  parts  of  the  important  loops.  (7)  and  (8)  represent 
routine  calling  overhead.  (9)  is  loop  control. 

FORFOR  Bairstow 

Maximal  sequence  length:  18(16). 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

181 

18.4 

A 

FMPR 

FADR 

MOVNM 

MOVE 

FMPR 

FSBR 

MOVE 

FMPR 

FADR 

MOVNM 

CAMGE 

AOJA 

MOVE 

FMPR 

FSBR 

MOVE 

(2) 

181 

9.7 

A 

FADR 

MOVNM 

CAMGE 

AOJA 

MOVE 

FMPR 

FSBR 

MOVE 

FMPR 

(3) 

226 

10.9 

A 

MOVE 

FMPR 

FS3R 

MOVE 

FMPR 

FADR 

MOVNM 

(4) 

148 

8.3 

A 

FMPR 

FADR 

MOVEM 

MOVE 

FMPR 

FADR 

MOVEM 

CAMGE 

AOJA 

MOVE 

(5) 

492 

2.6 

B 

CAMGE 

AOJA 

(6) 

859 

19.2 

B 

MOVE 

FMPR 

FADR 

(7) 

581 

13.1 

B 

MOVE 

FMPR 

FSBR 

Sequence  (1)  is  the  full  loop  of  the  synthetic  division.  (2)  and  (3)  are  probably  sections  of 
this  loop  which  remain  thanks  to  bad  pruning.  (4)  is  the  same  as  (3)  in  BLISS  Bairstow,  but 
the  full  loop.  (5)  is  loop  control,  (6)  and  (7)  are  timeconsuming  combinations  of  arithmetic 
operations. 


FORTEN  Bairstow 
Maximal  sequence  length:  20. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

44 

4.4 

A 

MOVEM 

MOVEI 

PUSHJ 

CAIA 

MOVE 

JUMPG 

CAMN 

ASHC 

ADDI 

MOVSM 

MOVS1 

FADM 

ASH 

TLC 

FAD 

MOVE 

FAD 

FDV 

MOVEM 

FMP 

(2) 

148 

8.9 

A 

FADR 

MOVEM 

MOVE 

FMPR 

FADR 

MOVEM 

ADDI 

AOJL 

MOVE 

FMPR 

(3) 

222 

6.1 

A 

MOVE 

FMPR 

FADR 

MOVEM 

(4) 

452 

23.7 

A 

MOVN 

FMPR 

FADR 

MOVN 

FMPR 

FADR 

MOVEM 

(5) 

181 

5.9 

A 

ADDI 

AOJL 

MOVN 

FMPR 

FADR 

MOVN 

(6) 

226 

6.6 

A 

FMPR 

FADR 

MOVEM 

ADDI 

AOJL 
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BASIC  Crout 

Maximal  sequence  length:  20. 


Count 

2 Time 

B/A 

Sequence 

2811 

36.7 

A 

FAD 

TLZ 

CAMGE 

POPJ 

HRRZ 

IMUL 

HRRZ 

PUSHJ 

MOVE 

AOS 

MOVE 

FAD 

TLZ 

CAMGE 

POPJ 

ADD 

ADD 

XCT 

MOVE 

POPJ 

2811 

36.5 

A 

JSR 

JRST 

PUSH 

LDB 

JRST 

JRST 

MOV  SI 

MOVE 

HLRZ 

PUSHJ 

MOVE 

AOS 

MOVE 

FAD 

TLZ 

CAMGE 

POPJ 

HRRZ 

IMUL 

HRRZ 

1001 

13.7 

A 

MOVE 

POPJ 

FMPR 

FADR 

MOVEM 

MOVE  I 

MOVE 

FADR 

JRST 

CAMLE 

MOVEM 

MOV  El 

MOVE 

MOVEM 

006 

JSR 

JRST 

PUSH 

LDB 

JRST 

1239 

4.1 

A 

M0VEI 

MOVE 

FADR 

JRST 

CAMLE 

MOVEM 

918 

3.5 

A 

JSR 

JRST 

PUSH 

LDB 

JRST 

JRST 

7126 

13.7 

B 

MOVE 

FAD 

TLZ 

7126 

34.7 

B 

PUSHJ 

POPJ 

MOVE 

AOS 

MOVE 

FAD 

TLZ 

CAMGE 

Sequences  (1)  and  (2)  are  largely  overlapping  parts  of  the  array  accessing  code.  (3)  contains 
most  of  the  general  UUO  handler  in  the  context  of  one  of  the  inner  product  loops,  with 
access  to  a matrix  and  some  arithmetic.  (4)  is  loop  control.  Its  original  time  was  5.12  of  the 
total.  (5)  is  the  general  UUO  handler.  Its  original  time  wa»  15.32.  (6)  is  the  abbreviated 
truncation  of  indices  to  integer,  (7)  shows  this  in  the  context  of  the  routine  that  also  checks 
for  index  overflow. 


-n 


BUSS  Crout 

Maximal  sequence  length:  20. 


Count 

2 Time 

B/A 

Sequence 

2109 

47.9 

A 

CAMLE 

MOVE 

IMULI 

ADD 

ADD 

MOVE 

IMULI 

ADD 

ADD 

MOVE 

FMPR 

FADRB 

AOJA 

361 

11.0 

A 

CAMLE 

MOVE 

IMULI 

ADD 

ADD 

MOVE 

IMULI 

ADD 

ADD 

MOVE 

FMPR 

FADRB 

AOJA 

CAMLE 

JRST 

MOVE 

SUB 

JRST 

POP 

POP 

2451 

39.8 

A 

ADD 

MOVE 

FMPRB 

FADRB 

AOJA 

CAMLE 

MOVE 

IMULI 

ADD 

865 

4.2 

A 

PUSH 

PUSH 

PUSH 

424 

2.8 

B 

PUSH 

PUSH 

PUSH 

PUSH 

6010 

38.8 

B 

MOVE 

IMULI 

ADD 

ADD 

5530 

41.1 

B 

MOVE 

IMULI 

ADD 

ADD 

MOVE 

400 

3.0 

B 

MOVE 

IMULI 

ADD 

ADD 

MOVN 
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I 

t 


r I 

I 1 


I 


Sequence  (1)  is  the  call  of  ALOG  in  the  beginning  of  the  program,  with  some  environment.  (2) 
is  the  same  as  (3)  in  BLISS  Bairstow.  (3)  is  part  of  the  same  and  reflects  bad  pruning.  (4)  to 
(6)  are  from  the  synthetic  division  and  again  reflect  bad  pruning. 


5.3.6  Crout 


ALGOL  Crout 


Maximal  sequence  length:  20. 


Seq. 

Count 

1.  Time 

B/A 

Sequence 

(1) 

1282 

19.6 

A 

AOPJP 

EXCH 

MOVE 

ROTO 

LSH 

and: 

(2) 

1001 

21.3 

A 

ADD 

PUSHJ 

FMPR 

UFA 

MOVEM 

JRST 

(3) 

819 

14.3 

A 

MOVEM 

ADD 

JRST 

MOVE 

MOVE 

JSP 

(4) 

1585 

3.6 

B 

JRST 

AOS 

(5) 

7351 

12.0 

A 

MOVE 

ADD 

(6) 

1225 

6.2 

A 

MOVE 

ADD 

(7) 

3532 

11.5 

A 

MOVE 

ADD 

(8) 

1646 

6.6 

A 

MOVE 

ADD 

V9) 

1015 

6.8 

A 

MOVE 

ADD 

The 

run  time 

system  shows 

up  prominently, 

MOVE 

ADDI 

HLLZ 

SETZB 

ROTC 

ROT 

ANDI 

HLRZ 

HRRZ 

ANDI 

LSH 

CAIN 

JRST 

HLRZ 

MOVE 

JSP 

MOVEI 

JRST 

MOVEI 

FADI 

UFA 

FADI 

POPJ 

MOVEM 

AOS 

CAMLE 

MOVE 

ADD 

AOS 

CAMLE 

MOVE 

ADD 

MOVE 

MOVE 

ADD 

MOVE 

ADD 

FMPR 

MOVEI 

JRST 

MOVEI 

PUSHJ 

CAMLE 

FMPR 

MOVE 

ADD 

MOVE 

ADD 

MOVE 

MOVE 

ADD 

FMPR 

as  in  sequence  (1)  and  others.  The  double 


precision  add  or  conversion  is  (2),  part  of  an  innerproduct  loop  with  a call  to  a double 
precision  routine  is  shown  in  (3).  (4)  is  loop  control,  (5)  to  (9)  are  various  representations  of 
the  matrix  and  vector  access  code:  (5)  is  the  basic  vector  access,  (7)  the  basic  matrix  access, 


using  Iliffe  vectors.  (6),  (8)  and  (9)  are  common  contexts  for  these  accesses. 


i 


f 


■ 

, 


i 
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(S)  3460  6.3  B AOJA  CAMLE 

Sequence  (1)  shows  the  inner  product  loop  (two  matrixes).  (2)  shows  the  same  loop  with  its 
exit,  and  exit  from  the  routine.  (3)  is  unknown,  maybe  part  of  both  inner  product  loops.  (4) 
and  (5)  show  parts  of  routine  entry,  (6)  to  (8)  are  forms  of  the  matrix  access,  (9)  is  loop 

control. 


FORFOR  Crout 

Maximal  sequence  length:  20, 

Seq.  Count  7 Time  B/A  Sequence 


1225 

24.2 

A 

JFCL 

FMPR 

JFCL 

UFA 

JFCL 

FMP1 

J'  CL 

UFA 

FAD1 

POP 

POP 

POPJ 

MOVEM 

MOVEM 

MOVE  I 

PUSHJ 

PUSH 

PUSH 

UFA 

FAD1 

1015 

15.3 

A 

MOVE 

M0VE1 

MOVEM 

MOVE 

1MUL 

MOVE 

1MUL 

ADD 

MOVE 

ADD 

MOVN 

ADD 

MOVE 

MOVE1 

M0VE1 

MOVEM 

MOVE  M 

MOVEM 

PUSHJ 

PUSH 

2466 

18.7 

B 

MOVE 

1MUL 

MOVE 

1MUL 

ADD 

MOVE 

ADD 

The  double  precision  arithmetic  is  shown  in  (1),  the  inner  product  loop  in  (2).  (3)  is  access  to 
a formal  matrix. 


FORTEN  Crout 

Maximal  sequence  length:  20. 


Seq. 

(1) 

Count 

819 

7 Time 
29.4 

B/A 

A 

Sequence 

ADD1  AOJL 
1MUL  ADD 
PUSHJ  PUSH 

MOVE 

ADD 

PUSH 

IMUL 

MOVE 

PUSH 

ADD 

FMPR 

UFA 

ADD 
MOVE  I 
FADI 

MOVE 

MOVEI 

(2) 

511 

3.3 

A 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

(3) 

256 

1.7 

B 

MOVEM 

MOVEM 

MOVEM 

MOVEM 

MOVEM 

MOVEM 

(4) 

735 

4.9 

B 

MOVEM 

MOVE 

MOVEM 

MOVE 

MOVEM 

MOVE 

(5) 

2390 

21.2 

B 

MOVE 

IMUL 

ADD 

ADD 

MOVE 

(6) 

2796 

21.8 

B 

MOVE 

IMUL 

ADD 

ADD 

(7) 

1345 

2.1 

B 

ADD1 

AOJL 

(1)  is  an  innerproduct  loop  with  loop  control,  access  to  two  matrixes  and  entry  to  the  double 
precision  routine.  (2)  to  (4)  indicate  the  need  for  a vider  variety  of  moves.  (2)  and  (3)  are 
from  routine  entry  and  exit  sequences.  (5)  and  (6)  are  matrix  access.  (7)  is  loop  control. 
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5.3.7  Treesort 

This  algorithm  was  chosen  because  it  contains  packed  data  and  linked  structures.  It  is  the 
shortest  of  our  subject  programs,  and  the  WHILE  loop  dominates  all  the  results.  The  only 
mtersting  feature  is  the  different  way  the  five  systems  use  to  pack  information  into  words. 
In  each  case  we  tried  to  write  the  program  in  a way  that  the  system  in  question  was  known 
to  handle  efficiently.  In  the  case  of  FORFOR,  therefore,  we  used  division  by  an  octal  constant 
that  is  a power  of  2 to  unpack,  since  this  was  known  to  generate  a shift.  Similarly  in  the 
BLISS  version  we  used  the  bytepointer  construct,  which  generates  halfword  instructions. 

The  BASIC  result  is  not  compatible  with  the  others  for  two  reasons:  A shorter  vector  was 
sorted,  to  reduce  execution  time,  and  the  vector  fetch  is  very  different  from  in  the  other 
systems,  as  stated  elsewhere. 

The  results  were: 

ALGOL  Treesort: 

(1)  8574  18.23  B MOVE  IDIVI 

BASIC  Treesort: 

(2)  2514  6.5  B FDVR 

BLISS  Treesort: 

(3)  8174  7.5  B HLRZ 

FORFOR  Treesort: 

(4)  8974  16.0  B MOVE  LSH 

FORTEN  Treesort: 

(5)  8174  45.0  B MOVE  IDIV 
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5.3.8 


PERT 


ALGOL  PERT 

Maximal  sequence  length:  20. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

555 

20.0 

A 

XCT 

PUSHJ 

PUSHJ 

MOVE 

PUSH 

MOVE  I 

MOVE 

PUSH 

HLRZ 

PUSHJ 

MOVE 

ADD 

MOVE 

POPJ 

POP 

POP 

TLNE 

POPJ 

MOVE 

POPJ 

(2) 

All 

13.6 

A 

MOVE 

POPJ 

POP 

POP 

TLNE 

POPJ 

MOVE 

POPJ 

MOVE 

MOVE 

ADD 

CAME 

JRST 

JRST 

SOS 

CAIGE 

XCT 

PUSHJ 

PUSHJ 

MOVE 

(3) 

A87 

6.7 

B 

JRST 

AOS 

CAMLE 

MOVE 

ADD 

MOVE 

ADD 

MOVE 

CAIG 

(A) 

1A61 

9.5 

e 

MOVE 

ADD 

MOVE 

ADD 

(5) 

3A15 

16.3 

B 

MOVE 

ADD 

MOVE 

(6) 

622 

2.8 

8 

JRST 

AOS 

CAMLE 

Sequence  (1)  is  the  complete  thunk  for  the  parameter  to  SCAN,  including  its  call  by  XCT  in 
SCAN,  its  excursions  into  the  run  time  support  routines,  and  its  return  to  SCAN.  (2)  is  the 
loop  in  SCAN,  when  the  test  in  the  enclosed  conditional  is  false.  It  overlaps  the  thunk  in  (1), 
but  not  completely.  (3)  is  the  beginning  of  the  loop  enclosing  the  first  case  statement 
(switch  usage),  including  loop  control.  (A)  is  access  code  for  two  level  indexing,  (5)  is  the 
access  code  for  one  level  indexing  in  vectors.  (6)  is  loop  control. 


I 

j 

I 


BASIC  PERT 


Maximal  sequence  length:  20. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

87A 

10.5 

A 

JRST 

PUSH 

CAMLE 

LDB 

MOVEM 

JRST 

MOV  El 
JRST 

005 

MOVSI 

JSR 
MOVE  I 

JRST 

MOVE 

HRRZ 

TRNN 

JRST 

PUSHJ 

MOVE 

AOS 

(2) 

3989 

AA.8 

A 

MOVE  I 
AOS 

MOVE 

MOVE 

HRRZ 

FAD 

TRNN 

TLZ 

JRST 

CAMGE 

PUSHJ 

POPJ 

MOVE 

ADD 

ADD 

XCT 

MOVE 

POPJ 

(3) 

87A 

8.6 

A 

MOVE  I 
AOS 

MOVE 

MOVE 

HRRZ 

FAD 

TRNN 

TLZ 

JRST 

CAMGE 

PUSHJ 

POPJ 

MOVE 

ADD 

ADD 

XCT 

(A) 

3989 

35.7 

A 

PUSH 

HRRZ 

LDB 

TRNN 

JRST 

JRST 

JRST 

PUSHJ 

MOVSI 

MOVE 

MOVEI 

AOS 

MOVE 

MOVE 
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(5) 

3115 

17.2 

A 

005 
MO  VS  I 

JSR 

JRST 

PUSH 

LDB 

JRST  JRST 

(6) 

1002 

A.6 

A 

JSR 

JRST 

PUSH 

LDB 

JRST 

JRST 

(7> 

926 

3.3 

B 

MOVE 

FADR 

JRST 

CAMLE 

MOVEM 

(1)  is  probably  the  SCAN  loop,  showing  loop  control  and  entry  into  the  vector  fetch  UUO.  (2) 
is  the  body  of  the  vector  fetch  UUO.  (3)  overlaps  (2)  and  represents  the  vector  store 
operations.  (A)  and  (5)  are  included  as  examples  of  bad  pruning.  (A)  overlaps  the  general 
UUO  mechanism  but  does  not  complete  the  vector  fetch  sequence  of  which  it  is  a part.  The 
same  holds  for  (5),  which  contains  the  complete  UUO  mechanism  but  continues  into  the  fetch. 
(6)  is  the  UUO  mechanism  as  it  should  be  with  good  pruning.  Its  original  count  was  A991, 
with  22.97.  of  the  time  consumed  by  it.  (7)  is  loop  control. 

BLISS  PERT 

Maximal  sequence  length:  13  (12). 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

A37 

12.1 

A 

ADD 

MOVE 

CAME 

JRST 

SOJG 

MOVE 

MOVE 

MOVE 

(2) 

A87 

12.8 

A 

AOJA 

CAMLE 

MOVE 

ADD 

MOVE 

ADD 

SKIPG 

(3) 

399 

6.2 

A 

MOVE 

ADD 

MOVE 

ADD 

(A) 

527 

8.2 

B 

MOVE 

ADD 

MOVE 

CAME 

(5) 

202 

3.1 

B 

MOVE 

ADD 

MOVE 

MOVEM 

(6) 

1716 

19.6 

B 

MOVE 

ADD 

MOVE 

(7) 

996 

6.8 

B 

AOJA 

CAMLE 

(1)  is  the  loop  in  SCAN,  when  the  test  is  not  equal.  (2)  is  the  loop  control  and  test  of  the 
loop  enclosing  the  first  CASE  statement.  (3)  is  addition  of  vector  element,  or  two  level 
indexing.  It  consumed  1A.57,  of  the  time  before  reduction.  (A)  to  (6)  show  further  variants  of 
vector  access,  with  one  or  two  level  indexing.  (7)  is  loop  control. 
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FORFOR  PERT 

Maximal  sequence  length:  14  (12). 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

411 

14.6 

A 

ADD 

ADD 

CAME 

SUB 

JRST 

MOVEM 

CAMGE 

MOVE 

AOJA 

MOVE 

MOVEM 

MOVEI 

(2) 

227 

6.8 

A 

MGVu 

MOVE 

JUMPLE 

ADD 

MOVE 

ADD 

CAMGE 

AOJA 

MOVEM 

MOVE 

(3) 

536 

12.5 

A 

ADD 

ADD 

MOVEI 

HRRM 

JSA 

MOVM 

JRA 

(4) 

625 

10.3 

B 

MOVE  I 

HRRM 

JSA 

MOVM 

JRA 

(5) 

545 

8.8 

A 

MOVEM 

MOVE 

MOVE 

ADD 

ADD 

(6) 

1170 

15.1 

B 

MOVE 

MOVE 

ADD 

ADD 

(7) 

1725 

16.4 

B 

MOVE 

MOVE 

ADD 

(8) 

481 

7.2 

A 

MOVE 

CAMGE 

AOJA 

MOVEM 

MOVE 

(9) 

1228 

10.9 

B 

CAMGE 

AOJA 

MOVEM 

(1)  is  the  loop  in  SCAN,  (2)  is  the  beginning  of  the  loop  surrounding  the  first  case  (computed 
GO  TO).  (3)  shows  a rather  inefficient  way  of  obtaining  absolute  values,  it  is  shown  in  its  full 


glory  as  (4).  (5)  indicates  that  vector  access  with  two  level  indexing  may  be  of  importance, 
this  is  verified  by  (6)  and  (7).  (8)  shows  loop  control  in  context,  (9)  on  its  own. 


FORTEN  PERT 

Maximal  sequence  length:  13  (12). 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

268 

11.8 

A 

ADD 

MOVE 

SKIPG 

ADD 

SKIPLE 

MOVE 

CAILE 

ADD 

JRST 

MOVM 

MOVE 

ADD 

(2) 

227 

6.1 

A 

ADD 

MOVE 

SKIPG 

JRST 

ADDI 

AOJL 

MOVE 

ADD 

(3) 

487 

12.1 

A 

ADDI 

AOJL 

MOVE 

ADD 

MOVE 

ADD 

SKIPG 

(4) 

411 

16.3 

A 

MOVE 

ADD 

CAME 

SUB 

JRST 

MOVEM 

AOS 

ADD 

AOSGE 

JRST 

MOVEI 

(5) 

477 

7.4 

A 

MOVE 

ADD 

MOVE 

ADD 

(6) 

1986 

22.6 

B 

MOVE 

ADD 

MOVE 

(7) 

268 

4.0 

A 

MOVE 

MOVEM 

MOVE 

MOVEM 

(8) 

913 

4.9 

B 

ADDI 

AOJL 

(1)  is 

the  body  of  the  CASE  statement 

(computed  GO  TO),  including  the  preceeding  test  a 

the  computation  of  absolute  value.  (2)  is  the  loop  enclosing  (1),  as  seen  when  the  initial  test 
is  false.  (3)  is  the  same  when  the  test  is  true  and  calculation  is  to  proceed  as  in  (1).  (4)  is 


w 
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the  loop  in  SCAN.  (5)  and  (6)  show  the  vector  accessing  code,  (7)  indicates  the  need  for 
memory  to  memory  move,  (8)  is  loop  control. 


5.3.9  HSvie 

All  the  results  from  this  algorithm  are  dominated  by  the  loop  which  calls  on  the  integrand, 
and  by  the  computation  of  the  integrand.  The  only  interesting  feature  is  the  use  of 
unrounded  an  other  ur  usual  arithmetic  in  the  mathematical  library  routines  computing  SQRT 
and  EXP.  We  give  a few  examples  of  this. 

ALGOL  HSvie: 


Normal  arithmetic  used. 
BASIC  HSvie: 

(1)  1024  11.6 

B 

FAQ 

MOVE 

FDV 

FAD 

FSC 

(2)  1024 

9.9 

B 

FOV 

FADR 

XCT 

FSC 

BLISS  Hivie: 

(3)  512 

13.4 

B 

FSC 

MOVEM 

FMP 

FAD 

MOVE 

(4)  512 

21.2 

B 

FDV 

FAD 

FSC 

FDV 

FADR 

(5)  512 

10.5 

B 

FSC 

JRST 

POP 

POP 

POP 

These  are  believed  to  be  conesecutive  sequences  during  execution. 

FORFOR  H^ie: 

(6)  1024 

21.5 

B 

FAD 

MOVE 

FDV 

FAD 

FSC 

(7)  1024 

20.8 

B 

FDV 

FADR 

FSC 

SKIPA 

JRA 

These  are  believed  to  be  consecutive.  The  BLISS  mathematical  routines  were  "borrowed" 
from  the  FORTRAN  library,  this  explains  the  similarity  of  results  for  these  two  languages. 

FORTEN  Havie: 

(8)  1024  17.7  B MOVE  FDV  FAD  FSC  MOVE 

(9)  1024  22.3  B MOVE  FDV  FADR  FSC  POPJ 
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5.3.10  Isirtg 


ALGOL  Ising 


Maximal  sequence  length:  17. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

<1> 

983 

18.9 

A 

AOBJP 

MOVE 

EXCH 

ROTC 

LSH 

ANDI 

(2) 

438 

7.8 

A 

LSH 

JUMPN 

ADDI 

MOVE 

HRLZI 

HRRI 

(3) 

438 

8.4 

A 

SOJL 

PUSH 

ANDI 

LSH 

HRRZ 

ADDI 

(A) 

414 

8.3 

A 

MOVE 

HRRZ 

PUSH 

HLLZ 

PUSH 

AOJA 

(5) 

381 

7.2 

A 

EXCH 

HLRZ 

AOJA 

SOJL 

HLRZ 

ANDI 

(6) 

360 

7.1 

A 

HRRZ 

TLNE 

AOJA 

AOBJP 

ROTC 

EXCH 

(7) 

396 

5.5 

A 

CAIN 

HLRZ 

JUMPN 

MOVE 

(8) 

381 

6.6 

A 

PUSH 

PUSH 

SOJL 

PUSH 

(9) 

1044 

9.3 

A 

CAMLE 

AOS 

MOVE 

(10) 

574 

5.1 

A 

JRST 

AOS 

MOVE 


MOVE 

ROT 

LSH 

ADDI 

ANDI 

HLLZ 

HLRZ 

SETZB 

HRRZ 

ROTC 

ANDI 

AND 

CAIG 

ADDI 

JFFO 

MOVEI 

SKIPN 

SUB 

PUSH 

HRLI 

HRRZ 

MOVN 

HLRZ 

JUMPN 

MOVE 

ANDI 

AND 

LSH 

JFFO 

HRLZ 

SKIPN 

HLRZ 

PUSH 

ADDM 

PUSH 

SOJL 

HRRZ 

MOVEI 

XCT 

EXCH 

CAIE 

HLRZ 

PUSH 

SOJL 

SOJL 

PUSH 

LSH 

PUSH 

HLRZ 

AOJA 

ANDI 

SOJL 

LSH 

PUSH 

HRLZ 

JUMPN 

MOVE 

ROTC 

MOVE 

MOVE 

MOVE 

ADDI 

MOVEM 

HLLZ 

MOVEM 

SETZB 

ANDI 

MOVE 

ADD 

MOVEM 

ADD 

MOVEM 

HRRZ 

AOJA 

TLNE 

AOBJP 

HLLZ 

AOJA 

PUSH 

SOJL 

MOVEI 

PUSH 

EXCH 

AOJA 

HLRZ 

SOJL 

MOVE 

ADD 

MOVE 

MOVEM 

JRST 

MOVE 

CAMLE 

MOVE 

MOVE 

ADD 

Sequences  (1)  through  (8)  all  represent  parts  of  the  run  time  support  routines,  particularly 
those  used  at  routine  calls  and  name  parameter  access.  These  functions  probably  account 
for  around  507  of  the  execution  time.  (9)  and  (10)  represent  parts  of  some  some  program 
loop  or  loops,  possibly  the  assignment  to  nonlocal  vectors  in  SORT. 
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BLISS  Ising 

Maximal  sequence  length:  14  (13). 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

184 

8.4 

A 

AOJA 

CAMLE 

JRST 

MOVE 

ADD 

MOVE 

ADD 

AOJ 

MOVEM 

AOS 

MOVE 

CAMG 

JRST 

(2) 

784 

19.3 

A 

CAMLE 

MOVE 

MOVE 

ADD 

MOVE 

MOVEM 

AOJA 

(3) 

296 

6.9 

A 

MOVE 

MOVEM 

CAMLE 

MOVE 

MOVE 

ADD 

(4) 

381 

13.6 

A 

PUSHJ 

PUSH 

PUSH 

PUSH 

HRRZ 

SUBI 

PUSH 

(5) 

378 

5.7 

B 

POP 

POPJ 

SUB 

(6) 

281 

5.4 

B 

SUB 

POP 

POPJ 

SUB 

(7) 

1163 

8.0 

B 

AOJA 

CAMLE 

(8) 

1999 

15.6 

B 

MOVE 

ADO 

Here  (1)  is  a piece  of  the  SORT  routine,  containing  the  end  of  one  loop,  an  assignment 
statement  involving  a formal  vector,  and  a test  ending  an  outer  loop.  (2)  is  from  the  loops 
that  initialize  formal  vectors.  (3)  is  probably  the  initialization  of  one  of  these  loops  and  some 
of  the  loop.  The  function  entry  and  exit  sequences  are  represented  by  (4)  through  (6),  loop 
control  by  (7)  and  formal  vector  access  by  (8). 


FORFOR  Ising 

Maximal  sequence  length:  14. 


Seq. 

Count 

7.  Time 

B/A 

Sequence 

(1) 

112 

7.2 

A 

SUB 

MOVEM 

MOVEM 

MOVE 

MOVNI 

MOVEM 

ADD 

MOVE 

MOVE 

MOVEM 

ADD 

CAMGE 

ADD 

(2) 

184 

10.6 

A 

MOVEM 

MOVEM 

CAMGE 

AOS 

MOVEI 

MOVE 

ADD 

CAMG 

MOVE 

JRST 

ADD 

ADD 

(3) 

860 

15.9 

A 

MOVE 

MOVEM 

CAMGE 

AOJA 

(4) 

245 

10.3 

A 

JSA 

MOVEM 

MOVEM 

MOVEI 

PUSH 

PUSH 

PUSH 

(5) 

248 

5.3 

A 

JRST 

MOVE 

MOVE 

HRROI 

JRA 

(6) 

414 

6.5 

B 

JSA 

MOVEM 

MOVEM 

(7) 

657 

6.6 

B 

MOVE 

ADD 

The  sequence  (1)  was  not  identified.  (2)  is  the  same  loop  as  (1)  for  BLISS  Ising,  (3)  is  the 
vector  initialize  loops,  the  vectore  in  the  FORTRAN  version  being  held  in  COMMON.  (4)  to  (6) 
represent  the  calling  and  exit  sequences,  (7)  gives  an  idea  of  the  cost  of  formal  vector 


access. 
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FORTEN  Ising 

Maximal  sequence  length:  16  < 15). 


Seq. 

Count 

2 Time 

B/A 

Sequence 

(1) 

184 

13.4 

A 

MOVE 

ADD 

MOVNI 

ADD 

ADD 

MOVEM 

MOVNI 

ADD 

MOVE 

SUB 

MOVE 

MOVE 

MOVEM 

ADDI 

AOJL 

(2) 

184 

9.2 

A 

MOVE 

ADD 

MOVE  I 

ADD 

ADD 

MOVEM 

AOS 

CAMG 

JRST 

MOVE 

(3) 

860 

15.2 

A 

MOVE 

MOVEM 

ADDI 

AOJL 

(4) 

360 

6.0 

A 

MOVE  I 

MOVEM 

MOVE  I 

MOVEM 

(5) 

657 

7.0 

B 

MOVE 

ADD 

(6) 

381 

4.4 

B 

MOVE 

POPJ 

(7) 

414 

6.1 

B 

MOVE  I 

PUSHJ 

MOVEM 

(8) 

381 

5.5 

B 

JRST 

MOVE 

POPJ 

(9) 

1144 

8.4 

B 

ADDI 

AOJL 

(1)  is  unknown,  but  probably  in  SORT.  (2)  is  the  same  sequence  as  (1)  in  BLISS  Ising,  (3)  is 
the  initialization  of  the  COMMON  vectors  in  SORT,  (4)  is  unknown,  (5)  is  at  least  in  part  formal 
vector  access,  (6)  to  (8)  is  routine  entry  and  exit,  and  (9)  is  loop  control. 


5.4  Sequences  applied  to  data  types 

Sequences  (1)  to  (6)  of  the  BASIC  compiler  consume  about  30 1 of  the  total  time  of 
compilation.  Much  of  this  could  be  saved  by  recoding  (1),  as  previously  described.  An  even 
larger  gain  in  time  would  be  achieved,  however,  if  the  PDP-10  had  an  instruction  to  move 
text  (byte  strings),  with  the  action  to  be  taken  on  each  byte  defined  by  a table.  By  a 
suitable  set  of  options  defined  by  each  table  entry,  this  instruction  could  replace  all  of  the 
constructs  pointed  to  by  sequences  (1)  to  (6).  Such  an  instruction  would  also  reduce  space 
cost  compared  to  the  recoded  form  of  (1),  and  programming  cost  in  any  case. 

Character  handling  also  shows  up  in  the  results  from  ALGOL,  sequences  (1)  to  (8),  where  it 
may  be  assumed  to  consume  well  above  102  of  the  time,  and  in  FORTEN,  sequence  (6),  where 
it  consumes  at  least  5.52  of  the  time.  We  know  that  all  compilers  have  to  perform  this  kind 
of  processing,  the  reason  it  does  not  show  up  in  the  others  may  be  that  it  is  more 
distributed  over  the  program,  and  that  text  lines  are  not  processed  as  an  entity.  If  an 
instruction  as  indicated  were  provided,  compilers  would  be  written  to  make  use  of  it  at  a 
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benefit.  It  can  further  be  safely  assumed  that  it  would  find  application  in  I/O  routines,  the 
importance  of  such  routines  is  vindicated  by  our  introductory  experiments  as  related  on  page 
37.  The  need  for  this  type  of  instruction  was  also  pointed  out  by  Alexander  [AleW72]. 

Another  observation  we  can  make  from  this  material  is  that  vector  operations  are  important 
in  many  different  contexts,  and  occur  to  a significant  degree  in  many  of  our  programs:  Vector 
moves  consume  47.  to  147.  of  the  time  in  Aitken,  67.  to  207.  in  Ising.  Searches  in  ordered 
vectors  consume  37.  to  407.  in  Aitken,  innerproduct  consumes  207  to  607.  of  the  time  in  Crout. 
Access  to  vector  elements  consumes  from  57.  to  507.  in  many  programs,  most  in  the  BASIC 
programs  where  they  are  done  through  run  time  system  routines. 

Hence  instructions  for  vector  operators  could  be  introduced  to  advantage.  The  least  that  can 
be  done  is  to  make  the  vector  move  operation  already  existing  In  the  hardware  easily 
available  in  higher  level  languages.  This  is  only  a first  step,  however.  We  propose  a vector 
type  along  the  following  lines: 

The  concept  of  vectors  with  a compile  time  determined  address  should  be  unified  with 
that  of  dynamically  located  vectors.  They  should  be  given  a common  formal  descriptor 
and  representation. 

The  descriptor  should  allow  for  vectors  stored  in  non  consecutive  but  equidistant 
locations.  Zero  should  be  a legal  value  for  this  distance.  This  would  facilitate  operations 
on  both  coloumns  and  rows  of  matrixes;  vector  moves  would  perform  initialization  of  a 
vector  with  a single  value,  vector  addition  would  compute  the  sum  of  a vector,  and  so  on. 

Further,  the  vectors  should  be  easily  combineable  into  matrixes  and  access  to  individual 
elements  of  vectors  and  matrixes  should  be  no  more  difficult  than  in  common 
implementations  in  present  systems. 

The  operators  could  include  moves,  searches  (possibly  binary),  vector  addition,  and  inner 
product,  the  latter  accumulated  in  double  precision. 

Possibly  this  vector  type  could  further  be  unified  with  the  character  string  type  discussed 
above. 


Other  data  instructions  that  might  be  useful  are  memory  to  memory  move,  and  conversion 
between  fixed  and  floating  point  numbers.  Both  of  these  contribute  significantly  to  the 
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execution  time  in  more  than  one  of  our  programs.  The  type  conversions  have  in  fact  been 
included  in  the  KI10  processor  for  the  DECsystem  10.  This  saves  4 to  5 instuctions  on  each 
use  in  a general  context,  1 or  2 in  the  restricted  context  of  BASIC  matrix  access.  For  some 
BASIC  programs,  this  could  amount  to  3 1 or  47  of  the  execution  time. 

Finally  we  remark  that  instructions  for  packing  can  save  considerable  time  where  they  exist 
in  the  ISP  and  are  made  available  by  the  compiler  used.  The  language  PASCAL  [WirN71] 
shows  how  this  can  be  integrated  into  a rigid  type  mechanism. 

Two  objections  against  some  of  these  instructions  are  that  they  do  not  easily  fit  into  the 
PDP-10  instruction  format,  and  the  difficulty  of  accessing  them  from  current  higher  level 
languages.  The  latter  problem  can,  in  part  at  least,  be  solved  by  giving  them  the  syntactic 
status  of  subroutines.  This  is  already  commonly  done  for  operations  like  negate  and  absolute 
value. 


5.4.1  Summary 

In  the  previous  section  we  proposed  several  data  types  and  instructions  for  inclusion  in  the 
PDP-10.  For  each  of  these,  evidence  of  its  usefulness  was  found  in  several  algoritms  and 
across  most  languages.  The  sequences  used  to  perform  these  operations  were  different  from 
language  to  language,  but  the  underlying  operations  were  the  same.  This  convinced  us  that 
our  results  are  valid  descriptions  of  the  needs  of  algorithms.  For  subject  set  selection  it 
indicates  that  the  intended  area  of  application  should  be  covered  reasonably  well,  but  that 
the  choice  of  language  is  less  important. 


5.5  Properties  of  operands 

As  mentioned  in  the  introduction  to  this  chapter,  data  types  desirable  for  inclusion  in  the  ISP 
are  not  only  such  that  are  expensive  to  simulate  using  existing  operators.  Other  data  types 
might  be  desirable  in  order  to  reduce  the  space  cost  of  data  storage,  and  to  some  extent  the 
time  cost  of  the  operators. 

Examples  are  given  by  Wortman  [WorD72]  and  Alexander  [AleW72].  They  have  observed  the 
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distribution  of  written  constants  in  source  programs,  and  found  that  a large  fraction  of  the 
integer  constants  can  be  held  in  very  few  bits.  (93/  and  567  respectively  in  A bits.  The 
discrepancy  may  be  caused  by  Wortman’s  use  of  student  programs,  whereas  Alexander  used 
larger  programs).  One  would  expect  a similar  observation  to  hold  for  dynamic  occurrences  of 

integers. 

If  the  operands  of  each  instruction  are  written  on  the  trace,  this  dynamic  distribution  can 
easily  be  observed.  To  relate  these  observations  back  to  specific  storage  locations  and 
variables,  and  to  find  the  maximum  space  needed  for  each  variable,  would  require  an  array 
equal  to  the  whole  data  area  of  the  subject  program,  hence  this  is  a relatively  expensive 
analysis.  Furthermore  several  variables  might  share  the  same  physical  storage  location, 
adding  further  complication.  Hence  the  utility  of  a hardware  subrange  type  is  not  easily 
determined  exactly,  although  a good  indication  could  be  found.  We  do  not  do  this  at  present. 

To  do  a similar  analysis  for  floating  point  types  is  even  harder,  since  there  is  no  way  of 
telling  how  much  of  the  accuracy  provided  is  really  necessary.  This  must  be  left  to  numerical 
analysts.  A weak  indication  is  provided  by  observing  the  usage  of  immediate  type  floating 
point  instructions. 


Non-uniform  distribution  of  values  is  not  a phenomenon  restricted  to  written  integer 
constants.  It  has  been  observed,  as  reported  by  Hamming  [HamR70]  and  Pinkham  [PinR61], 
that  "naturally  occurring  numbers"  do  not  have  uniformly  distributed  mantissae.  Rather,  the 
mantissae  seem  to  be  distributed  according  to  the  density  function: 

r(x)  ■ l/(x  * ln(b))  (1/b  < x < 1) 

where  b is  the  base  of  the  number  system.  For  a binary  computer  with  mantissae  in  [0.5,  1>, 
this  seems  to  imply  that  about  587.  of  the  mantissae  would  be  in  [0.5,  0.75>.  The  essential 
property  of  this  distribution  seems  to  be  its  invariance  to  scale  ti  ansformations. 

Tracing  methods  can  be  used  to  obtain  more  ®xperimental  verification  of  this,  and  to  evaluate 
methods  designed  to  exploit  it.  Other  observations  of  operand  values  could  have  relevance 
for: 

Variable  length  data  types 

Representation  of  control  and  addressing  information 
Rounding  procedures  in  floating  arithmetic 
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5.6  Data  types,  Conclusions 

In  this  chapter  we  have  presented  various  methods  for  detecting  unnecessary  data  types  and 
operators  in  existing  ISPs,  and  for  detecting  non  existing  but  desirable  ones. 

The  former  methods  are  based  on  frequency  counts  of  instructions,  and  most  of  them  have 
also  been  presented  by  other  workers  in  the  field.  Our  conclusions  about  these  methods 
were  presented  in  Section  5.1.3.  We  pointed  out  that  the  results  are  sensitive  to  changes 
both  in  programming  language  and  algorithm,  and  hence  that  a subject  set  should  be  well 
distributed  over  the  area  of  application  and  over  the  languages  used. 

For  the  latter  problem,  we  presented  a heuristic  algorithm  for  detecting  significant  dynamic 
sequences  of  instructions.  This  algorithm,  including  the  heuristics,  is  our  work.  The  algorithm 
is  structured  so  that  the  heuristics  are  easily  changed,  and  new  heuristics  may  be  easily 
added.  This  method  is  also  applicable  to  control  operators  and  address  calculation. 

The  results  were  presented  in  Section  5.4.  They  are  less  dependent  on  language  and 
algorithms  than  the  frequency  results,  and  properties  common  to  the  programs  are  brought 
out  strongly.  This  led  us  to  propose  several  types  and  operators  for  inclusion  in  the  ISP 
that  we  worked  on.  A subject  set  for  this  method  need  not  represent  many  languages,  but 
should  cover  most  concepts  of  the  intended  area  of  application. 

Finally  we  propose  that  desirable  data  types  may  also  be  suggested  by  a study  of  the 
operand  values  from  existing  data  types.  No  experimental  results  from  this  method  are 
presented. 
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CHAPTER  6 
CONTROL  OPERATORS 


Our  major  methods  for  studying  control  operators  are  the  same  as  for  data  operators,  i.  e. 
frequency  counts  in  its  various  disguises,  and  instruction  sequences.  The  results  of  the 
sequence  studies  are  presented  in  Section  6.1.  We  give  no  comments  on  the  frequency 
results  above  those  given  in  Section  5.1.3.  We  also  propose  some  new  methods  for  use  in 
particular  situations.  These  are  discussed  in  Section  6.2. 

Frequency  counts  indicate  that  control  operators,  as  defined  below,  account  for  a large 
fraction  of  the  total  number  of  instructions  executed  (33Z  by  our  SNIFT,  Figure  5-4). 
Furthermore,  control  structures  are  among  the  most  important  means  of  structuring 
programs.  It  follows  that  efficient  implementation  of  control  operators  contributes  to 
reduced  programming  cost  as  well  as  time  and  space  cost. 

Further  motivation  for  studying  control  structures  and  operators  is  found  in  the  difficulties  of 
compiler  writing,  particularly  in  code  optimization.  A great  deal  of  effort  at  both  compile  and 
run  time  goes  into  maintaining  (setting  and  restoring)  state  information.  This  applies  on 
subroutine  and  coroutine  calls  as  well  as  in  more  local  control  contexts  where  several 
program  branches  merge.  The  inability  of  compilers  to  cope  with  this  problem  is  one  of  the 
major  reasons  for  generation  of  inefficient  code.  An  alternative  approach  to  the  problem 
would  be  to  design  ISPs  such  that  the  amount  of  state  to  be  maintained  is  less,  or  where  it 
can  be  saved  and  restored  more  efficiently. 

Control  operators  are  primarily  those  which  may  change  the  contents  of  the  program  counter 
to  a value  different  from  the  default  value  (Old  value  + 1,  n+l’th  address  etc).  Since  almost 
all  programs  are  written  in  higher  level  languages,  it  is  reasonable  to  extend  this  definition  to 
include  instructions  used  for  implementing  higher  level  control  structures.  Such  control 
structures  may  be  grouped  as: 

Statement  level: 

Unconditional  jumps 
Conditionals 
Case  selection 
Loops 
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Program  level: 

Subroutines 

Coroutines 

Parallel  processes  (tasks) 

On  the  program  level,  program  context  changes  and  program  communication  are  most 
important.  Communication  ranges  from  parameter  and  result  passing  for  subroutines  to 
synchronization  for  processes. 

Our  methods  are  not  suited  to  analysis  of  programs  with  processes,  since  such  programs,  and 
certainly  the  most  important  ones,  have  to  execute  at  full  speed  in  order  to  adequately 
handle  the  real  time  situation  they  are  designed  for.  The  slowdown  caused  by  the  tracing 
interpreter  would  therefore  perturb  the  results. 

There  may  also  be  more  or  less  control  associated  with  the  ope'ators  of  the  language,  ie.  the 
programmer  may  or  may  not  have  to  supply  explicitly  the  control  necessary  for,  say,  matrix 
operations,  depending  on  the  language  (FORTRAN  vs.  APL).  If  the  control  is  supplied  with  the 
operator,  the  compiler  can  in  general  generate  more  efficient  code,  since  the  context  is 
better  defined. 

The  most  important  classes  of  con'  ’ators  on  the  ISP  level  may  now  be  described  as: 

Unconditional  jumps 
Simple  tests  (implying  jumps  or  skips) 

Loop  jumps  (count,  test  and  jump) 

Subroutine  and  return  jumps 
Stack  manipulating  instructions 
Execute  instructions 
Some  monitor  calls 
Other  instructions  in  special  contexts 


6.1  Sequences  applied  to  control 

In  this  section  we  discuss  those  sequences  from  Section  rj.3  that  are  relevant  to  control 
operators. 

Most  noticeable  is  the  cost  of  the  run-time  system  for  ALGOL  programs.  This  consumes  507 
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of  the  execution  time  for  Ismg,  207.  for  Crout.  To  achieve  a reasonable  efficiency  for  ALGOL 
programs  with  many  routine  calls  and  name  parameters,  special  instructions  and  descriptor 
formats  should  be  introduced,  This  observation  is  not  new;  it  has  influenced  several  ISP 
designs,  in  particular  those  of  the  Burroughs  B5000  and  its  descendants  and  siblings. 

A related  feature,  more  common  to  all  the  languages,  is  the  cost  of  subroutine  calls.  ~\  his  is 
most  easily  spotted  in  BLISS  programs,  since  the  BLISS  calling  sequences  include  stack 
instructions  that  are  never  used  in  other  contexts.  In  the  BLISS  compiler  calling  seque  ices 
consume  at  least  257  of  the  time,  in  the  FORTEN  compiler  at  least  157.  Both  of  these 
compilers  were  written  in  BLISS*.  In  the  other  programs  where  we  have  observations,  the 
time  consumed  varies  between  57  and  207  of  the  total;  57  in  FORTEN  Crout,  127  in  FORTEN 
Ising,  and  over  167  in  FORFOR  Ising. 

The  functions  performed  by  these  sequences  are  transmission  of  parameters  and  result, 
manipulation  of  return  linkage,  and  state  setting.  The  latter  includes  setting  up  system 
registers  as  well  as  saving  and  restoring  user  registers.  The  exact  constructs  needed 
depend  heavily  on  the  language.  We  present  one  example: 

BLISS  programs  would  execute  considerably  more  efficiently  if  the  PUSHJ  and  POPJ 
instructions  could  manipulate  the  F register**,  and  remove  the  parameters  from  the  stack 
after  exit.  The  address  field  of  the  POPJ  instruction,  presently  unused,  could  be  used  to  hold 
the  number  of  parameters,  so  there  would  be  no  space  cost  at  the  call  site,  and  the  change 
would  fit  cleanly  into  the  existing  structure.  This  would  reduce  the  instruction  count  by  4 in 
each  call,  more  in  some  cases.  For  the  BLISS  compiler  1/8  of  the  instruction  cour.t  would  be 
saved  this  way;  this  is  about  half  '•f  .he  instructions  executed  in  calling  sequences.  If  one 
were  able  to  specify  which  registers  to  save  on  entry  and  restore  on  exit,  two  further 
instructions  could  be  saved  on  each  call  for  each  such  register.  There  is,  however,  no  room 
in  the  instruction  word  to  specify  this.  This  is  a problem  common  to  all  calling  sequences. 

A variant  of  the  subroutine  call  is  the  UUO.  In  our  material  this  is  used  almost  only  to  call 
the  BASIC  run  time  system.  Since  this  includes  vector  and  array  accessing,  UUOs  are 
frequently  used  by  BASIC  programs,  and  the  central  UUO  handler  of  BASIC  contributes  157 
to  237  of  the  total  execution  time.  This  UUO  handler,  which  consists  of  6 instructions, 


* Two  reasons  for  the  difference  may  be  that  parameters  of  FORTEN  are  passed  in  registers, 
or  that  there  are  fewer  small  routines. 

**  The  F register  points  to  the  activation  record  of  the  most  recently  entered  routine. 
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processes  the  return  linkage  and  selects  the  right  run  time  routine.  Parameters  and  state 
are  processed  at  the  call  site  and  in  the  individual  routines.  Hence  the  cost  of  UUOs  is 
extremely  high  compared  to  using  one  of  the  subroutine  call  instructions.  An  exception  is 
when  only  one  UUO  is  used.  In  that  case  the  central  UUO  handler  reduces  to  one  instruction. 
The  advantage  of  UUOs  over  other  subroutine  calls  is  that  they  allow  a memory  address 
(subjected  to  the  standard  effective  address  calculation)  and  an  accumulator  address  to  be 
transmitted  to  the  routine  at  no  extra  cost  in  space  or  time  at  the  call  site.  It  also  permits 
linkage  to  subroutines  through  a table  defined  at  load  time  and  with  no  name 
correspondence.  This  is  of  small  importance,  however.  From  this  we  conclude  that  UUOs 
should  be  used  only  in  very  special  circumstances  where  the  extra  time  cost  is  justified. 
UUOs  are  also  discussed  in  Section  5.1.3. 


Another  common  construct  is  loop  control.  This  often  consumes  no  more  than  27.  to  57  of  the 
execution  time,  but  may  consume  as  much  as  97  (Aitken-L)  or  107  (FORFOR  PERT).  It 
appeared  in  at  least  16  programs,  consuming  at  least  27.  of  the  time  in  each.  In  spite  of  the 
looping  instructions  provided  in  the  PDP-10,  most  loop  control  sequences  consist  of  two  or 
more  instructions.  This  is  primarily  due  to  the  fact  that  most  loops  count  upward  to  a non 
zero  limit,  hence  loop  control  needs  to  address  both  the  limit  and  the  branch  target  (assuming 
the  counter  to  be  in  a register  and  the  increment  to  be  1).  Contributing  are  the  facts  that 
languages  often  require  ihe  test  to  be  performed  at  the  beginning  of  the  loop  but  the 
stepping  of  the  counter  at  its  end,  and  the  need  to  store  the  loop  counter  in  memory. 

Results  reported  by  Knuth  [KnuD70],  Shaw  [ShaM71],  and  Alexander  [AleW72],  for  FORTRAN, 
ALGOL  and  XPL,  show  that  937  to  957  of  all  wri'ten  counting  loops  have  an  increment  of  one. 
This  form  of  loop  could  be  done  more  efficiently  in  the  PDP-10  if  the  AOBJN  (Add  one  to 
both,  jump  if  negative)  were  used.  This  instruction  keeps  the  loop  counter  in  the  right  half 
of  a register,  the  left  half  is  initialized  to  the  negative  of  the  desired  number  of  traversals  of 
the  loop.  Each  time  the  AOBJN  is  executed,  both  halves  of  the  register  are  incremented  by 
one,  and  the  jump  is  taken  if  the  result  (i.e.  the  left  half)  is  negative. 

This  instruction  is  rarely  used  in  our  subject  set:  709  times  in  our  1 million  instruction  SNIFT. 
The  reason  is  that  extra  tests  must  be  performed  to  make  sure  that  the  bound  and  counter 
will  not  overflow  the  halfword  allocated  to  them.  This  suggests  that  two  registers  should  be 
used,  one  to  hold  the  upper  bound  and  one  for  the  counter.  Our  results  in  Chapter  4 show 
that  there  are  sufficiently  many  registers  to  permit  this.  Downwards  count  to  a nonzero  limit 
can  be  handled  by  a similar  instruction. 
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Commonly  used  sequences  for  loop  control  consist  of  a AOJXX  CAMXX  pair.  Our  instruction 
will  execute  in  less  time  than  the  CAMXX,  since  no  memory  operand  is  needed.  Hence  these 
instructions  would  reduce  the  time  cost  of  loop  control  by  407.  to  507,  or  up  to  57.  of  the 
execution  time  of  some  programs.  For  very  short  loops,  such  as  initialization  of  vectors,  this 
saving  could  be  a significant  fraction  of  the  time  of  the  loop.  The  prologue  may  imply  a 
larger  space  cost  than  for  most  present  loop  controls.  The  hardware  cost  is  that  of  adding 
the  new  instructions).  The  instructions  integrate  reasonably  well  into  the  PDP-10  ISP 
structure,  hence  the  programming  cost  will  probably  be  reduced. 

We  finally  draw  attention  to  various  forms  of  testing  that  are  prominent  in  some  of  our 
subject  programs.  This  is  seen  in  the  ALGOL  run  time  system  and  in  the  compilers,  and 
consumes  27.  to  117.  of  the  time.  The  ALGOL  run  time  system  also  does  a great  deal  of  bit 
manipulation.  We  can  not  suggest  any  improvements  on  these  operations  without  further 
knowledge  of  their  semantics. 


6.2  Some  special  problems 

In  this  section  we  discuss  some  problems  associated  with  control  operators  in  general,  or 
with  special  control  operators,  which  are  not  easily  solved  using  the  more  general  methods. 


6.2.1  Control  information 


An  important  aspect  of  control  operations  is  the  control  information,  Le.  that  information 
which  is  processed  by  the  normal  data  operators,  but  whose  main  raison  detre  is  its  use  for 
control  purposes.  This  includes  loop  counters,  stack  pointers,  return  addresses  and  other 
addresses,  parameter  descriptors,  displays,  etc.  Ideas  for  improved  control  operators  might 
come  from  studying  how  such  information  is  processed. 


We  make  the  simplifying  assumption  that  we  may  disregard  information  stored  in  primary 
memory,  and  consider  only  register  contents.  The  information  in  a register  is  used  for 
control  purposes  at  the  control  points,  i.e.  whenever  the  register  is  addressed  by  a control 
operator.  We  are  interested  in  the  history  of  control  information  accumulated  at  control 
point  t. 
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The  sequences  of  sections  5,2  and  6,1  tell  us  something  about  this,  but  they  have  several 
deficiencies:  They  are  not  accumulated  at  control  points,  they  contain  instructions  irrelevant 
to  the  cotilrol  information,  and  they  cover  a too  short  span  of  time. 

Another  form  of  history  which  we  already  have  is  the  register  usage  classes  of  Section  4.5. 
These  classes  are  also  inadequate  for  the  present  purpose,  since  only  the  Kinds  of  events  in 
the  life  of  the  register  are  Known,  their  order  and  number  is  unKnown. 

A third  form  of  history  is  the  sequence  of  instructions  that  operated  on  the  specific  register 
before  the  control  point  was  reached.  Such  eggistsf.  sequ6fK£.&  can  be  collected  by  a 
process  somewhat  similar  to  that  described  in  Section  5.2,  but  in  many  ways  simpler.  Its 
main  properties  are: 

a)  Sequences  are  accumulated  separately  for  each  register,  and  only  instructions 
affecting  i!  at  register  are  included. 

b)  Each  sequence  is  restricted  to  one  R-life  of  that  register.  (R-life  defined  in  Section 
4.2).  This  might  cause  some  sequences  (particularly  those  representing  the  history  of 
a loop  counter)  to  become  very  long.  A Kleene  star  Kind  of  concept  would  be  useful 
in  such  cases,  or  the  sequences  may  be  truncated  at  the  old  end. 

c)  Sequences  are  tabulated  each  time  the  register  is  used  for  a control  purpose. 

d)  The  collection  taKes  place  in  one  pass.  If  space  is  scarce,  some  Kind  of  pruning  might 
be  necessary. 

In  such  histories,  the  time  order  of  the  events  is  preserved,  but  only  events  affecting  the 
particular  register  is  recorded.  If  parts  of  the  computation  have  taken  place  in  other 
registers,  this  information  is  lost.  We  do  not  believe  this  to  be  a serious  problem,  however. 
If  it  is,  one  may  build  the  expression  trees  for  the  information  instead  of  the  sequences. 
Techniques  for  doing  this  are  constantly  used  in  compilers,  though  with  the  opposite  goal.  In 
such  trees  the  exact  order  of  operations  is  lost,  and  only  those  aspects  of  it  are  preserved 
which  are  relevant  to  the  arithmetic  value  of  the  result. 

We  propose  register  sequences  as  the  method  for  study  of  control  information,  most  likely  to 
give  useful  results  at  a reasonable  cost.  We  have,  however,  not  programmed  this  method, 
and  hence  have  no  experimental  results  to  support  this  contention. 
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6.2.2  Test. instructions 

To  perform  a test,  3 addresses  are  needed:  two  for  the  values  to  compare  and  one  for  the 
instruction  that  is  to  be  executed  if  the  test  succeeds.  On  the  other  hand,  most  ISPs  have  at 
most  2 addresses  in  each  instruction  {memory  address  and  register  or  2 memory  addresses). 
Three  techniques  are  commonly  in  use  to  solve  this  problem: 

a)  An  implicit  operand,  usually  0,  is  used  for  the  test.  This  method  is  adequate  when  the 
value  tested  either  does  not  have  to  be  computed,  or  is  used  for  other  purposes  than 
testing.  This  can  be  studied  by  register  sequences,  possibly  extended  beyond  the 
control  point. 

b)  An  implicit  change  (SKIP),  usually  1,  2 or  3,  is  made  in  the  value  of  PC  depending  on 
the  result  of  the  test  (succeeds  or  fails;  >,  - or  <).  This  may  require  another  1 or  2 
jump  instructions  to  follow  the  skip  instruction,  but  at  most  one  of  these  is  executed, 
often  none.  This  method  is  adequate  when  the  false  path  is  exactly  one  instruction 
long,  and  continues  into  the  true  path.  Sequences  may  be  used  to  study  the  relative 
frequencies  of  SKIP  JUMP  and  SKIP  NO- JUMP  pairs.  This  requires  a modification  to 
the  sequence  program  so  that  these  combinations  are  always  printed  before  they  are 
pruned.  Many  SKIP  NO-JUMP  pairs  indicate  that  this  construct  is  used  to  advantage. 

c)  A condition  code  (CC)  is  used  to  store  the  result  of  the  test.  This  is  subsequently 
tested  by  an  instruction  which  specifies  the  conditional  new  value  of  PC  in  its  address 
field  and  the  desired  state  of  CC  in  its  opcode  or  register  address  field.  If  CC  is  set 
by  the  arithmetic  instructions,  the  first  instruction  of  this  pair  is  not  always 
necessary  and  thic  scheme  may  or  may  not  be  more  economical  in  space  and  time 
costs  than  the  ones  previously  described.  This  method  is  adequate  if  the  value 
tested  is  that  most  recently  computed  and  it  is  also  used  for  other  purposes. 

If  the  ISP  under  study  does  not  use  CC’s,  a few  lines  of  code  in  the  program  that 
accumulates  IFT’s  will  simulate  a CC.  The  tables  that  describe  the  instructions  in 
terms  of  the  program  structure  distribution  must  be  available.  In  this  way  we  may 
estimate  how  frequently  the  introduction  of  condition  codes  would  have  simplified  the 
program. 

None  of  the  above  methods  were  implemented;  some  of  the  other  results,  however,  have 
some  bearing  on  these  problems. 
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The  program  structure  distribution,  presented  in  Figure  5-4,  indicates  that  the  accumulator  is 
most  often  tested  against  memory.  The  compilers  form  an  exception;  here  the  bit  tests  and 
the  tests  against  an  immediate  operand  are  more  used.  The  importance  of  testing  against 
memory  may  in  part  be  due  to  the  use  of  these  instructions  in  the  loop  control.  Bit  testing 
and  testing  against  immediate  operands  are  second  in  importance;  tests  against  0 are  least 
important.  However,  testing  memory  against  0 is  as  important  as  the  analogous  test  for  the 
accumulators.  Taken  together,  the  tests  against  zero  are  almost  as  important  as  the 
accumulator  versus  memory  tests.  These  results  refer  to  instruction  count.  In  computed 
time,  the  tests  involving  memory  inn-ease  in  relative  importance. 

We  conclude  that  programmers  prefer  b)  to  a),  and  that  they  rarely  need  to  test  values 
genuinely  for  zero,  at  least  not  recently  computed  ones.  The  memory  against  zero  tests  are 
most  common  in  compilers,  this  may  indicate  tests  of  long  lasting  status  indicators,  table 
entries,  etc.. 


6.3  Control  operators,  Conclusions 


This  concludes  our  discussion  of  control  operators.  We  have  presented  the  results  from  the 
sequence  method  as  applied  to  control  structures,  and  also  suggested  some  other  methods 
for  obtaining  additional  information.  The  latter  methods,  however,  have  not  been 
implemented. 

The  detailed  implementation  of  control  varies  more  from  language  to  language  than  does  the 
use  of  data  operators.  Tins  is  particularly  so  for  languages  that  use  a run  time  system  for 
their  space  allocation  and  parameter  transmission.  There  is  also  some  variation  from 
algorithm  to  algorithm  due  to  the  different  degrees  to  which  the  algorithms  use  certain 
control  structures,  and  in  particular  those  that  involve  the  run  time  system.  Differences  are 
also  inherent  in  the  forms  of  processing  that  the  algorithms  do,  as  is  evident  from  the 
program  structure  distributions  in  Figure  5-4.  We  also  found  significant  similarities  across 
languages  and  algorithms.  This  is  clearly  seen  in  the  program  structure  distribution,  and 
even  more  clearly  in  the  sequences.  In  the  latter  case,  though  the  sequences  differ  in  detail, 
they  reflect  common  underlying  control  concepts,  and  can  in  many  cases  be  unified.  This  led 
us  to  propose  a modification  of  an  existing  instruction  for  loop  control,  and  to  point  out  a 
basic  flaw  of  the  routine  call  instructions.  We  also  pointed  out  the  inefficiency  of  the  UUO 
concept  of  the  PDP-10. 
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If  the  goal  is  to  detect  which  control  structures  are  common,  the  subject  set  need  not 
represent  many  languages,  but  it  should  be  well  distributed  over  all  control  concepts  used  in 
the  area  of  application.  However,  the  detailed  implementation  of  these  control  concepts  is 
highly  language  dependent,  particularly  where  a run  time  system  is  used.  Hence  a thorough 
analysis  of  programs  from  the  particular  language  should  be  done  if  detailed  implementation 
is  the  goal. 

Our  results  do  in  fact  suggest  that  the  ISP  should  have  separate  control  operators,  possibly 
microprogrammed,  for  each  commonly  used  language. 

For  the  same  reasons  as  when  we  discussed  data  types,  the  generality  and  consistency  of 
our  results  lead  us  to  believe  in  our  methods.  Our  remark  in  the  introduction  to  this  chapter 
about  compilers  and  state  maintenance  correlates  well  with  our  findings  about  routine  calls. 
Finally  we  remark  that  our  results  agree  well  with  experience,  intuition  and  afterthought. 
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CHAPTER  7 


ADDRESS  CALCULATION 


By  address  calculation  (in  a wide  sense)  we  mean  the  calculation  of  an  effective  address  to 
operands  or  instructions  in  physical  memory,  based  on  information  provided  in  the  instruction 
word,  in  memories  addressed  by  the  instruction  word,  and  on  other  information  held  in  the 
processor  state.  Within  the  problem  area  so  outlined,  there  are  3 subproblems: 

a)  Address  calculation  for  data  structuring  and  control  operations,  which  is  discussed  in 
Section  7.1.  Some  tf  our  sequence  results  are  relevant  to  this  problem.  These 
are  discussed  in  Section  7.1.1.  We  also  propose  some  other  methods  for  special 
problems  in  Section  7.1.3.  Some  of  these  are  closely  related  to  those  proposed 
for  control  operators  in  Section  6.2. 

b)  The  problem  of  mapping  a large  virtual  memory  into  a small  real  one.  This  problem 
has  been  addressed  by  many  authors,  hence  we  do  not  discuss  it  here,  but  refer  the 
reader  to  work  mentioned  in  Section  1.4.  The  basic  idea  of  these  methods  is  to  study 
the  stream  of  effective  addresses,  and  observe  how  locality  in  time  implies  locality  in 
space. 

c)  Uniting  the  need  for  a large  name  space  with  a short  address  field.  We  propose  no 
method  for  this  problem;  it  can  be  studied  by  methods  similar  to  those  used  for  b). 


7.1  Data  structuring 

The  most  common  tools  in  address  calculation  are  indexing,  indirection,  and  base  registers. 
We  discuss  our  methods  and  results  for  indirection  and  indexing.  The  use  of  base  registers 
is  closely  tied  to  problem  c)  above.  Since  we  present  no  methods  for  this  problem,  we  only 
mention  base  registers  in  passing. 


Following  a terminology  proposed  by  Foster  [FosC70J  we  will  mean  by  nominator  a cell 
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containing  an  (indirect)  address,  and  by  nominee  the  cell  thus  addressed.  Our  other 
terminology  is  standard. 


7.1.1  Sequences  applied  to  addressing 

In  this  section  we  discuss  those  of  the  sequences  in  Section  5.2  which  are  relevant  to  data 
structuring,  and  which  indicate  the  need  for  more  specialized  address  calculating  techniques. 
Our  results  reveal  two  related  such  structures,  namely  vectors  and  matrices. 

Vector  access  consumes  57  or  more  of  the  time  of  at  least  14  of  our  programs,  much  more  in 
two  special  cases:  537  in  BASIC  PERT,  and  467  in  ALGOL  PERT  which  has  a vector  element  as 
a name  parameter.  It  consumes  more  than  107  of  the  time  in  Aitken-G,  Aitken-L,  ALGOL 
Bairstow,  BLISS  PERT,  FORFOR  PERT,  FORTEN  PERT  and  BLISS  Ising,  where  more  conventional 
accessing  methods  are  used.  In  many  accesses  in  PERT  the  index  is  itself  an  indexed 
variable,  a fact  which  contributes  to  the  cost  for  that  algorithm. 

Vector  access  is  particularly  time  consuming  when  the  base  address  of  the  vector  is  not 
known  to  the  compiler,  that  is  when  the  vector  is  passed  as  a parameter  or  when  dynamic 
space  allocation  is  used.  The  problem  could  be  reduced  by  addressing  vector  elements 
indirectly  through  a nominator  whose  written  address  is  the  base  of  the  vector.  This  would 
require  that  the  same  index  register  was  used  for  all  accesses  to  the  vector.  The  compilers 
that  we  used  do  not  seem  willing  to  accept  this  restriction. 

In  Section  5.4  we  proposed  the  introduction  of  a vector  type  to  handle  vector  operations  as 
well  as  access.  Alternatively  some  other  solution,  such  as  the  introduction  of  base  registers, 
should  be  found  to  reduce  the  accessing  cost. 

The  other  data  structure  giving  rise  to  significant  sequences  is  matrices.  Matrices  are  used 
in  Crout,  SEC,  and  Aitken-L.  The  time  cost  of  accessing  was  77  of  the  total  computed  time  in 
Aitken-L,  and  157  to  207  in  SEC.  The  costs  for  the  versions  of  Crout  are  not  comparable, 
due  to  the  special  use  of  UUOs  in  BASIC,  and  the  non-uniform  use  of  double  precision 
arithmeti  which  consumes  much  of  the  time  where  used.  They  were:  11.57  for  ALGOL  Crout, 
607  for  BASIC  Crout,  397  for  BLISS  Crout  and  approximately  207  for  the  FORTRAN  versions. 
The  time  advantage  of  using  II if fe  vectors  is  clearly  seen  in  the  ALGOL  Crout  result. 
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In  many  algorithms,  such  as  Crout,  the  matrix  elements  are  accessed  in  a systematic  manner 
as  row  or  coloumn  vectors.  Hence  this  cost  could  be  reduced  by  introducing  the  vector  type 
proposed  in  Section  5.4  or  by  adequate  language  constructs.  To  speed  up  genuine  random 
access  to  matrices,  a matrix  type  with  special  descriptors  and  operators  could  be  devised. 
This  should  be  integrated  with  the  vector  type.  A step  in  this  direction  has  been  taken  in 
the  Burroughs  B5000  and  related  computers.  A vector  is  described  by  a one  word 
descriptor,  the  vector  so  described  may  itself  consist  of  vector  descriptors  (i.e.  it  is  an  Iliffe 
vector)  and  so  on. 


1 
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7.1.2  Indexing  and  indirection 

By  observing  the  frequencies  of  use  of  indirection  and  indexing,  we  may  assess  the  utility  of 
those  features.  Thinking  the  utility  of  indexing  to  be  above  doubt,  we  did  not  actually  count 
the  number  of  instructions  using  it.  We  did,  however,  count  the  number  of  register  lives 
used  for  indexing,  and  we  also  observed  what  other  kinds  of  operations  those  lives  were 
subject  to.  These  are  the  register  usage  classes  Of  Section  4.5.  Our  observations  are 
reported  in  Figure  4-17  and  Section  4.5. 

We  did  observe  the  frequency  of  use  of  indirection,  and  also  to  how  many  levels  indirection 
was  carried,  whether  the  nominator  was  in  a register,  and  whether  prt  indexing  or  post 
indexing  or  both  were  used1. 


, 
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Two  level  indirection  was  observed  in  all  the  ALGOL  programs,  and  in  FORTEN  Crout  and 
FORTEN  Ising,  the  level  2 nominators  comprising  from  about  1/10  to  2/3  of  the  total  number 
of  nominators  in  these  cases.  Indirection  off  byte  pointers  was  found  in  FORFOR,  FORFOR 
Bairstow,  FORFOR  PERT  and  FORFOR  SEC,  probably  associated  with  I/O,  and  comprising  about 
2.67.  of  the  total  number  of  indirect  accesses. 

Post  indexing,  was  found  in  the  ALGOL  programs  and  in  the  ALGOL,  BASIC  and  FORFOR 
compilers.  In  FORFOR  6.77  of  the  nominators  were  indexed,  in  ALGOL  PERT  63.87.  For  the 
other  programs  the  percentage  ranged  between  20  and  50.  Our  other  results  are  displayed 
in  figures  7-1  through  7-3. 

* By  pre  indexing  we  mean  indexing  used  in  the  instruction  word  to  access  the  (first) 
nominator.  By  post  indexing  we  mean  indexing  in  the  nominator  to  access  the  data  or  the 
next  nominator. 
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The  low  number  of  indirections  through  registers  indicates  that  indirection  could  not  be 
replaced  by  indexing  except  at  the  cost  of  extra  LOAD  instructions. 

The  results  for  the  ALGOL  programs  indicate  that  two  level  indexing  may  be  useful  in  certain 
circumstances,  for  instance  where  the  access  path  is  computed  and  has  a relatively  long 
lifetime,  or  where  it  depends  on  more  than  one  index.  Indirection  to  one  level  is  justified  by 
being  used  in  most  programs;  one  instruction  execution  is  saved  on  each  indirection  not 
through  a register.  The  instruction  count  of  FORTEN  Crout  would  increase  by  over  11.  if 
indirection  were  removed,  and  by  21  or  more  for  14  of  the  41  subject  programs. 


7.1.3  Addressing  information 

By  addressing  information  we  mean  computed  information  used  in  address  calculation,  such  as 
indexes  Or  nominators.  The  analogy  with  control  information  is  obvious,  and  information 
about  them  may  be  collected  in  the  same  way,  except  that  addressing  information  is 
collected  at  addressing  points,  defined  by  analogy  to  control  points.  The  reader  is  referred 
to  Section  6.2.1,  which  applies  mutatis  mutandis  to  addressing  information. 

A study  of  addressing  information  might  reveal  important  manipulation  of  such  information, 
that  could  lead  to  new  address  calculation  algorithms  in  the  ISP.  Analysis  of  addressing 
information  should  be  correlated  well  with  that  of  control  information,  particularly  loop 
counts  and  case  selectors,  which  from  other  experience  might  be  expected  to  play  a double 
role. 

It  may  also  be  of  interest  to  study  the  context  of  indexed  data  accesses.  Indexing  may  be 
d in  several  contexts,  and  the  following  can  probably  be  distinguished  mechanically: 

Record  access,  with  constant  offset  and  computed  base. 

Array  access,  with  computed  offset  and  constant  base. 

Array  access,  with  computed  base  and  computed  offset. 

Immediate  operands. 
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FIGURE  7-2 


Fraction  of  nominators  in  a register 
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FIGURE  7-3 

Fraction  of  indirections  pre  inde1  , 
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7.1.4  Operand  and  result  modes 

Related  to  addresss  calculation  is  the  choice  of  destination  for  the  result  of  data  operations, 
and  of  the  order  of  the  operand  for  non-commutative  operators  (Examples:  Add  accumulator 
to  memory,  result  to  memory;  Subtract  accumulator  from  memory;  etc.).  These  variants  of  the 
operators  may  be  expressed  as  part  of  the  opcode,  or  by  special  addressing  modes.  If  such 
modes  exist  on  the  ISP  in  question,  their  utility  can  be  assessed  by  frequency  counts.  If 
such  modes  do  not  exist,  sequences  do  not  suffice  to  establish  the  need  for  them,  since 
information  about  the  identity  of  operands  is  needed.  The  "result  to  memory”  mode  is 
indicated  by  the  occurrence  of  OPERATE  STORE  pairs  with  the  same  address.  If  the 
accumulator  contents  is  used  after  such  a pair,  the  indication  is  for  a "result  to  both"  mode. 
The  "inverse  order  of  operand"  mode  is  needed  if  a large  number  of  LOAD  OPERATE  pairs 
exist,  where  both  specify  the  same  accumulator,  and  the  OPERATE  is  noncommutative  and 
addresses  a register  for  its  memory  operand. 

We  did  not  implement  detection  of  such  sequences,  and  hence  have  no  indications  for  or 
against  the  need  for  "inverse  order  of  operand"  instructions  in  the  PDP-10.  The  frequency 
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counts  in  the  SNIFT  indicate  that  both  the  “result  to  memory"  and  the  "result  to  both"  modes 
are  used,  particularly  for  the  commutative  operators.  Thus  FADRB  represents  147,  and 
FADRM  217.  of  all  the  occurrences  of  FADRX  instructions*  in  our  SNIFT,  r MPRB 
represents  2.47  of  all  the  FMPRXs.  Similarly  the  immediate  mode  for  floating  arithmetic  point 
is  justified,  with  6.47  of  the  FADRXs  and  5.47  of  the  FMPRXs. 


7.2  Addressing,  Conclusions 

The  most  important  part  of  this  chapter  discussed  those  results  of  our  sequence  method 
which  applied  to  address  calculation.  These  results  indicated  a need  for  improved  accessing 
methods  fo'  matrices,  and  for  vectors  with  a dynamically  determined  base  address,  such  as 
vectors  passed  as  parameters. 

We  further  presented  some  results  from  our  SNIFT,  throwing  light  on  the  use  of  different 
result  destinations  for  arithmetic  operators.  Due  to  our  restricted  subject  set,  these  latter 
results  are  considered  inconclusive,  but  they  do  suggest  a need  for  the  "result  to  memory" 
and  the  "result  to  both"  modes  on  the  POP- 10. 

There  is  nothing  in  these  results  to  contradict  our  earlier  conclusions  about  the  validity  of 
our  methods.  We  refer  the  reader  to  the  conclusion  sections  of  chapters  5 and  6,  which  also 
apply  here,  but  with  some  less  weight  on  the  dependency  of  operator  implementation  on 
language. 

Finally  we  presented  some  results  on  the  use  of  indirection.  These  show  that  one  level  of 
indirection  is  certainly  useful  for  our  subject  set,  possibly  two.  Both  pre  and  post  indexing 
was  used. 
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♦ FADR  is  floating  add  with  rounding,  FMPR  is  floating  multiply  with  rounding.  The  suffix  X 
indicates  the  special  mode:  Both,  Memory  or  Immediate. 
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CHAPTER  8 
CONCLUSION 


In  this  thesis  we  have  developed  some  methods  for  evaluation  cf  the  architecture  of 
instruction  set  processors.  The  methods  are  based  on  analyzing  traces  of  program  execution, 
the  traces  contain  information  about  every  instruction  executed  by  the  program.  The  traces 
are  written  as  the  program  is  executed  on  an  interpreter  for  the  ISP  under  investigation.  A 
set  of  programs,  the  subject  set,  is  used  to  represent  the  workload  of  the  ISP. 

The  main  advantages  of  these  methods  are: 

a)  The  level  of  detail  to  which  they  permit  us  to  go.  In  general  every  instruction 
executed,  as  well  as  any  desirable  information  from  the  processor  state  between 
instructions,  is  easily  recorded  on  the  trace.  If  desired,  parts  of  the  instruction 
interpretation  may  be  simulated,  and  information  from  this  traced.  In  our  case  we 
recorded  the  instruction  word,  effective  address,  program  counter,  indirect  chains, 
byte  pointers  and  final  operands. 

b)  The  general  applicability  of  the  methods.  The  subject  set  can  usually  be  chosen 
among  any  programs  that  can  be  compiled  into  the  standard  relocatable  format  used 
on  the  processor.  The  methods  are  not  restricted  to  a single  language  or  set  of 
languages. 

c)  The  ease  of  programming  of  the  methods.  Other  methods  could  conceiveably  provide 
some  of  the  same  information,  but  wcu'.d  imply  a cons  derable  analysis  of  relocatable 
programs  or  core  images  to  reconstruct  instruction  sequences  and  register  usage. 

The  subject  programs  have  to  be  brought  into  a format  acceptable  to  the  interpreter. 
Usually  the  standard  relocatable  format  is  convenient.  For  an  ISP  i nder  design  it  may 
therefore  be  difficult  or  impossible  to  obtain  a representative  subject  set.  However,  in  these 
days  of  microprogramming,  it  is  not  improbable  that  compilers  may  be  written  for  an  ISP 
before  the  ISP  itself  is  frozen.  For  existing  ISPs,  as  in  our  experimental  work,  the 
interpreter  may  run  on  its  own  ISP.  In  such  cases  the  relocatable  form  of  the  subject 
programs  may  be  used,  and  no  restrictions  are  posed  on  the  selection  of  the  subjoct  set. 


CONCLUSION 


155 


8.1  Overview  of  the  methods 

In  chapters  4 through  7 we  presented  various  issues  of  ISP  architecture,  viz.  register 
structure,  data  types  and  operators,  control  operators  and  address  calculation.  In  each 
chapter  we  presented  methods  to  deal  with  these  issues,  together  with  experimental  results 
obtained  using  our  subject  set. 

Some  of  the  methods  were  the  same,  or  analogous,  for  several  ISP  problems.  We  now  review 
the  methods  in  a methodologically  systematic  manner.  They  fall  in  five  categories: 

Instruction  sequences,  with  the  variant  register  sequences.  Sequences  are  used  to 
assess  the  need  for  new  data  types  and  data  operators,  control  operators,  and 
addressing  modes.  Register  sequences  (i.e.  instruction  sequences  restricted  to 
instructions  affecting  one  register)  can  be  used  for  studying  control  and  addressing 
information  in  more  detail  and  with  greater  accuracy  than  is  permitted  by  the  general 
se  ;uences. 

Frequency  counts  of  instruction  usage.  The  instruction  frequency  table  can  be  displayed 
in  different  formats,  sorted  by  execution  frequency  or  by  time  consumed,  grouped  into 
distributions/mixes,  or  output  in  the  form  of  the  FGR  function.  From  these  results  we  can 
see  which  operators  were  not  used,  and  can  be  omitted.  We  can  also  estimate  the  cost 
incurred  by  having  to  recode  some  of  the  instructions  if  the  instruction  set  k reduced, 
and  we  can  see  which  instructions  are  candidates  for  improved  implementation. 

Register  life  classification.  We  showed  how  to  detect  register  lives  (R-lives),  and  how 
they  could  be  classified  according  to  the  use  made  of  the  registers  during  the  lives.  This 
information  can  be  used  to  assess  the  need  for  generality  of  registers. 

Simultaneity  of  register  lives.  We  presented  algorithms  to  detect  how  many  registers  are 
used  simultaneously,  and  to  calculate  upper  bounds  for  the  time  cost  incurred  if  the 
number  of  registers  were  to  be  reduced  while  preserving  the  rest  of  the  ISP  structure. 
These  calculations  may  be  done  for  each  of  a number  of  classes  of  registers,  as  defined 
above,  as  well  as  for  the  total  set  of  registers. 

Miscellaneous  methods.  We  proposed  several  special  methods  for  special  problems. 
These  can  be  used  to  investigate  indirection,  the  utility  of  condition  codes  and  other 
solutions  to  the  addressing  problem  for  test  instructions,  distribution  of  operand  values 
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with  partword  operands  in  mind,  and  so  on.  One  may  also  implement  methods  for  special 
properties  of  the  ISP,  such  as  byte  pointers  on  the  PDP-10. 


The  methods  have  different  needs  for  data  space  and  tables  of  descriptions.  They  also  use 
different  parts  of  the  trace  input.  These  factors,  and  also  the  forms  of  analysis  performea, 
have  some  implications  for  the  programming  of  the  methods: 

The  instruction  sequence  algorithm  makes  many  passes  over  the  trace,  and  needs  a large 
data  space,  but  only  the  instruction  word  is  needed  from  the  trace,  and  no  tables  of 
descriptors  are  needed.  Hence  this  program  should  be  preceeded  by  a program  that 
condenses  the  trace.  This  latter  program  can  also  accumulate  the  IFT  and  print  its  various 
forms.  This  latter  process  requires  several  tables  of  descriptors  but  a moderate  amount  of 

data  space. 

The  algorithm  for  simultaneity  of  register  lives  has  two  phases,  the  former  writing  a special 
file  for  use  by  lhe  latter.  Neither  phase  uses  much  data  space,  but  the  first  needs  some 
table  space.  These  tables  are  the  same  as  are  used  for  R-life  classification.  The  latter 
algorithm  needs  some  data  space,  but  not  Overly  much.  Hence  it  may  be  programmed  with 
the  first  phase  of  the  simultaneity  algorithm. 

In  this  first  phase  all  register  usage,  including  indexing  and  indirection  through  registers, 
must  be  detected.  For  this  the  effective  address  is  needed.  Hence  the  indirection  statistics 
is  best  accumulated  in  this  program,  and  also  the  special  sequences  for  operand  and  result 

modes,  if  space  permits. 

To  accumulate  renter  we  need  information  about  the  addresses,  to  see  which 

registers  are  used,  so  that  the  instruction  can  be  associated  with  the  proper  register(s). 
Also,  some  data  space  is  needed  to  store  the  sequences.  These  sequences  can  furthermore 
be  collected  in  one  pass.  Hence  this  algorithm  does  not  blend  as  well  with  the  general 
sequence  algorithm  as  might  be  believed  at  first  sight.  Many  of  the  same  routines  and 
structures  can  be  used,  but  the  main  control  is  different.  Hence  this  method  is  best 

programmed  separately. 

The  same  holds  for  operand  analysis.  For  this  methods  the  tables  of  descriptions  used  for 
the  Gibson  or  Program  Structure  distributions  are  needed.  From  the  trace,  we  need  the 
instruction  word  and  the  operand  words. 
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8.2  Validity  of  the  methods 

In  Section  1.2.1  we  discussed  various  methods  for  collecting  dynamic  data.  It  is  at  th's  point 
evident  that  we  could  not  have  obtained  our  major  results  without  using  traces.  Both  the 
methods  for  register  structure  and  the  sequence  method  require  the  exact  sequence  of 
instructions  executed.  The  register  results  also  require  the  indirect  chains  and  bytepointers 
as  well  as  the  effective  rather  than  the  written  address  of  most  instructions.  This  amount  of 
detail,  and  the  preservation  of  sequentiality  which  is  inherent  in  tracing,  could  not  be 
obtained  by  any  of  the  other  methods  discussed  in  Section  1.2.1.  Jump  tracing  could  not  be 
used,  since  we  could  not  have  recorded  indirect  chains  or  effective  addresses  that  way. 

Many  of  the  methods  are  exact.  This  applies  in  particular  to  the  instruction  frequency 
results,  the  register  results  up  to  simultaneity,  the  register  classification  results,  and  the 
miscellaneous  small  methods.  Hence  for  these  methods  the  validity  of  the  results  depend 
mostly  on  the  selection  of  the  subject  set. 

The  sequence  method  is  particularly  inexact,  due  to  its  use  of  heuristic  methods,  and  to  the 
need  for  manual  analysis.  However,  the  results  from  this  method  showed  very  general 
results,  and  many  of  the  sequences  found  represented  general  concepts  not  particular  to  the 
language  or  algorithm  where  they  were  found.  This  supports  our  contention  that  these 
results  are  valid  and  useful. 

The  cost  of  reducing  the  number  of  registers  is  also  inexact,  being  an  upper  bound.  Our 
intention  was  to  check  these  results  for  some  of  our  BLISS  programs.  In  theory  and  manuals 
the  BLISS  compiler  permits  the  programmer  to  reserve  a number  of  registers,  so  that  they 
are  not  used  by  the  object  program  except  where  explicitly  named  in  the  source  program. 
However,  the  compiler  refused  to  generate  code  for  such  unwholesome  conditions,  and  the 
verification  could  not  be  done. 

Our  experimental  results  show  good  internal  consistency.  Many  of  the  results  are  in  general 
trend  independent  of  both  the  algorithm  and  the  programming  language  in  which  it  was 
coded,  and  the  details  often  show  systematic  variation  with  language  and  with  algorithm. 
Examples  are  the  register  results  for  ALGOL  and  BASIC  programs,  and  the  use  of  floating 
point  arit  imetic  in  Bairstow,  Crout  and  Havie.  This  is  a strong  support  for  their  validity. 

Some  of  the  results  also  agree  well  with  previous  knowledge  - the  state  maintenance 
problem  for  compilers  as  discussed  in  Chapter  6 is  one  example,  another  is  the  good 
agreement  of  our  Gibson  distribution  with  those  of  Gibson  and  Gonter. 
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The  dependence  on  language  is  most  important  for  those  languages  that  use  a run  time 
system  for  significant  parts  of  their  control  and  accessing  functions.  In  the  case  of  ALGOL, 
both  the  sequence  results  and  the  register  lives  were  clearly  influenced  by  this.  BASIC  also 
influenced  the  results  more  than  did  FORTRAN  and  BLISS.  This  is  because  BASIC  uses  only 
one  type,  because  no  information  is  kept  in  registers  between  statements,  and  because  a run 
time  system  is  frequently  used.  Hence  languages  with  such  special  properties  should  be 
represented  in  the  subject  set  if  they  are  used.  Also,  register  usage  in  general  depends  on 
language. 

Our  Aitken  results  show  that  the  variation  due  to  programmer  habits  can  be  large.  Analysis 
of  the  source  programs  show  that  the  variation  is  due  mostly  to  the  selection  of  strategies 
for  subproblems,  but  that  application  of  coding  tricks  also  plays  a part.  Our  sample  is  too 
small  to  show  more  than  this.  The  variation  is  mostly  in  the  sequence  results,  less  in  register 
usage.  This  suggests  that  register  usage  is  more  a function  of  the  language  and  compiler 
used  than  of  the  programmer  or  algorithm. 

The  register  results  are  not  particularly  dependent  on  algorithm.  This  is  natural,  since  higher 
level  languages  hide  register  usage  from  the  programmer.  The  choice  of  algorithm  has  a 
strong  influence  on  the  use  of  data  operators  and  data  structures. 

The  results  from  the  FORTRAN  programs  show  good  correlation  between  the  two  compilers. 
This  may  indicate  that  language  has  more  influence  on  the  object  program  structure  than  do 
compilers.  The  observation  may  be  peculiar  to  FORTRAN,  which  is  a well  understood 
language. 

A deficiency  of  the  methods  in  general  is  that  to  a large  extent  they  depend  on  the  compilers 
available  for  the  machine  analyzed.  A particularly  bad  or  unusual  implementation  of  a 
commonly  used  language  may  flavour  a whole  analysis,  and  in  no  case  do  the  results  of  an 
analysis  reflect  usage  of  ISP  features  beyond  those  that  can  be  made  available  to  programs 
within  the  state  of  the  art  of  compiler  writing.  On  the  other  hand,  the  results  do  indicate 
what  is  needed  to  generate  good  code  for  existing  languages  using  existing  compiler 
techniques. 

Similarly,  if  an  analysis  indicates  the  need  for  a new  operator  or  other  feature  in  the  ISP,  it 
is  not  sufficient  to  implement  it  in  the  processor.  It  must  also  be  made  available  to  the  users 
through  the  languages  they  use.  This  may  cause  compiler-technical  and  linguistic  problems. 


CONCLUSION 


159 


When  selecting  a subject  set  for  a full  scale  analysis,  care  should  be  taken  so  that  the  area 
of  applications  is  wei!  represented.  In  particular,  all  important  data  structuring  methods  and 
special  operations  should  be  included.  The  matrix  access  of  Crout,  and  the  unnormalized 
arithmetic  in  certain  contexts  clearly  show  this;  they  are  significant  where  they  occur.  The 
individual  subject  programs  should  be  large  enough  that  the  problem  of  dominating  loops  is 
reduced  to  its  right  proportions.  Good  representation  of  languages  is  important  for  register 
analysis,  and  particularly  for  details  of  control  structures  and  access  methods  for  data 
structures..  It  is  less  important  for  data  operators. 

Another  problem  occurs  when  analyzing  large  programs.  How  can  one  represent  all  aspects 
of  the  program  within  a trace  of  at  most  about  one  million  instructions?  The  obvious  solution 
is  a slight  modification  to  the  tracer,  and  possibly  the  operating  system,  so  that  the  tracer 
can  be  "turned  on"  for  maybe  5000  instructions*,  then  off  for  a period  of  time  in  which  the 
program  executes  at  full  speed,  and  then  on  again.  Each  time  the  tracer  is  turned  on 
computation  in  the  subject  program  has  progressed  significantly,  and  different  sections  of  it 
will  be  traced.  We  do  not,  with  this  method,  have  any  guarantee  that  the  resulting  trace 
represents  a cross  section  of  the  program,  but  our  hope  is  better  than  by  tracing  a 
consecutive  tape-full. 


8.3  Specific  results 

We  now  repeat  some  of  the  specific  results  obtained  using  our  subject  set  on  the  PDP-10. 
We  believe  most  of  them  generalize  to  similar  ISPs. 

Register  utilization  was  low.  The  average  number  of  live  registers  was  7 or  less  for  all 
programs,  the  number  of  registers  used  was  10  or  less  907.  of  the  time  for  all  programs, 
and  8 or  less  987.  of  the  time  for  29  of  the  41  programs.  Time  here  is  the  instruction 
count.  If  the  ISP  had  only  8 registers,  the  instruction  count  of  the  programs  would 
increase  by  less  than  207.  for  all  programs. 

The  instruction  count  of  calling  sequences  can  be  as  high  as  257  of  the  total  instruction 
count.  This  is  particularly  noteworthy  in  view  of  the  common  assumption  that  well 
structured  programs  will  have  many  subroutines. 


* It  should  be  long  enough  that  transients  caused  by  the  endpoints  are  insignificant 
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The  utilization  of  the  opcodes  was  low.  Our  subject  set  used  only  27the  4 out  of  421 
different  user  instructions.  One  set  of  128  instructions  would  suffice  for  98.8"/  of  the 
computed  time,  and  a slightly  different  set  of  128  instructions  would  suffice  for  98.67  of 
the  executed  instructions.  We  note  in  passing  that  an  instruction  set  of  128  instructions 
is  twice  the  size  of  that  of  the  CDC  6000  Central  Processor  ISP,  and  about  the  same  size 
as  that  of  the  IBM  360. 

Much  time  was  consumed  by  vector  operations  or  in  operations  that  could  be  subsumed 
under  a general  vector  type.  This  is  also  true  for  programs  that  do  not  use  the 
mathematical  concepts  of  vectors  or  matrices.  A vector  type  with  sufficiently  general 
operators  could  be  used  to  advantage  by  most  of  our  programs.  Possibly  as  much  as  307 
to  407.  of  the  execution  time  could  be  saved  in  some  cases. 

We  also  mention  the  need  for  character  string  operations,  and  the  high  cost  of  using 
UUOs. 

The  PDP-10  has  a very  spacious  instruction  word,  hence  both  a rich  instruction  set  and  a 
large  addressing  space.  Several  of  the  results  above  indicate  a reduction  of  the  functions  in 
a capability,  thus  freeing  instruction  word  space.  Our  suggestions  for  addition  of  functions 
do  not  nearly  consume  this  space  In  fact,  the  additions  indicated  could  probably  be  done 
using  the  instruction  word  space  which  already  is  available.  For  an  ISP  where  space  is 
scarce,  microprogramming  could  provide  one  way  of  using  it  efficiently  for  a given  class  of 
applications  (See  our  discussion  of  the  Burroughs  B17Q0,  page  15). 


8.4  Improvements  to  the  methods 

Our  present  programs  could  be  improved  in  several  ways: 

The  pruning  heuristics  used  for  the  sequence  collection  are  not  adequate,  as  discussed  in 
Section  5.2.2.  We  would  expect  improved  heuristics  to  significantly  reduce  the  amount  of 
insignificant  output  from  this  algorithm,  with  correspondingly  simplified  manual  analysis. 

The  results  of  Figure  4-27  show  that  we  would  have  achieved  a lower  cost  for  reduction  of 
the  number  of  registers  if  we  had  pronounced  the  registers  to  be  dead  after  a dormancy  of 
only  100  or  60  instructions,  instead  of  200.  An  even  lower  number  should  be  used  if  the 
cost  is  high  when  using  60. 
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All  of  our  analysis  programs  are  fairly  slow.  We  believe  worthwile  reductions  in  the  cost  of 
analysis  could  be  achieved  by  coding  critical  routines  in  machine  code,  and  by  cleaning  up 
certain  inefficiencies  causing  extra  parameter  transmissions. 

What  is  most  needed,  however,  is  to  try  the  methods  out  in  a large  scale  analysis  using  a 
significantly  larger  subject  set,  where  the  individual  programs  also  are  larger.  Only  when 
such  an  analysis  has  been  successfully  completed  can  we  claim  that  our  methods  have  really 
proved  their  worth. 


8.4.1  New  methods 

Some  new  methods  could  be  implemented.  These  include  the  operand  analysis,  register 
sequences  and  other  methods  outlined  in  previous  chapters,  but  also  one  more  general  one: 

Each  instruction  could  be  mapped  into  its  generalization  in  the  Program  Structure 
classification,  and  sequences  of  such  general  instructions  accumulated.  This  would  bring 
certain  control  operations  out  more  clearly,  as  for  example  SKIP  JUMP  sequences,  since  the 
conditions  on  the  tests  would  be  suppressed.  Also,  we  could  hope  to  obtain  information  on 
common  expression  forms,  generalized  calling  sequences  and  loop  control,  etc. 

If  the  results  of  such  analyses  show  that  the  number  of  sequences  found  in  each  analysis  is 
low,  and  that  commonality  between  algorithms  is  significant,  results  of  such  analyses  might  be 
combined  to  represent  the  whole  subject  set,  in  a way  analogous  to  our  present  SNIFT. 
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Pm  HhUUOPO  POOIF ICO. 

pm  p < if  ncoirico. 

Pm  P00IHC0  BY  lOGICW.  OI’CPmIION. 

pm  ttsico 

p»  usco  fop  ponitcp  PmPmuiup.  possibly  noointo 

Pm  UVO  FOP  BYTE  POINItP 
Pm  USCO  FOP  INOlPrCT  mOOPCSS. 

Pm  OCCUICO  IJUPP  OR  EUC  I 


SflPPLE  INSTRUCTION  DESCRIPTIONS 


FOUBi  (FLOUTING  OIVIOC  PC  SUL  T 10  MCCUPUIMTOP  HNO  PCPOPYI 
UOPO  Ii  HCCPOO<IN*OrtI<PPOOIf  •MtYRFlO'PuRFlO 
UOPO  2i  PCS TOP 

nmll  (MULTIPcr  IPPCOIMTC  1 1 
WORD  Ii  flCCPLr<IN«IPn<nNuSCO*MHPFI» 

UOPO  2i  B 

nOJKi  (MOO  ONE  TO  MCCUP(A«IOP.  JUPP  IF  XI. 

UOPO  1 1 flCCPOO*IN<JPP»PUSEO*M«PCQU 
UOPO  2|  PC£«CC‘RCTeST 


- --  "■  -- 


C-l 


nmNDix  c 

Output  from  register  c 1 • f i ( <*t  i on  proyam 


Pinpoint  addition  and  subtraction  arc  referred  to  a>  tnontcr  ot*tr  at  ions. 
The  non  obvious  cncodinj!  of  tt>e  usage  rar  n«r(crf  are: 


COf  IX 

Counter  and  fi*pomt  arittvnetic 

xOhTh 

Inde*  mq 

data  access 

FxFLO 

P i »nd  and  floating  torn! 

MftJn 

lnde » mg 

immediate  and  jumps 

COfLO 

Counter  and  float inq  point 

XOrtJH 

Indm  mg 

data  acres*  and  jumps 

CFXFL 

Counter,  fi«ed  and  floating 

>QJin 

Indexing 

data  accesses,  jumps  and  immediate 

XDATH 

Indexing  data  ac  esses 

X JUMP 

Indexing  jumps 

xinnt 

Indexing  immediate  operands 

FWSF  refer*  to  the  class  definition  given  tn  the  output  procedure. 
UNION  CLASS  is  the  union  of  all  classes  listed  above  it. 


Iftt  PUU  UCT 


HnSt  A NO  ITS  COnPLEhtNT 


777777 

CF  If  L 

«0J1H  S1BPE 

HHLPU  0YTDP  10C1C 

SHIM  SIHCF  ADORS  TESTS 

000000 

FRhC 

CUnui  - 

HVPGf 

CL  OSS 

COUNT 

T ION 

PPnCT 

LEM, Til 

lNTlPPP* 101  ION 

000101 

1319 

0.22*) 

0.220 

5.55 

FLOhT 

ST  OPE 

000001 

712 

0,116 

0.33b 

3.63 

PI  001 

nr  nil  in  o 

61G 

0.  ini 

0.13"* 

131 

oooo ie 

SSI 

0.090 

0 527 

3.26 

yOhTO 

000 100 

Sis 

0.009 

0 616 

7 90 

ST  OPE 

ornoon 

107 

0.079 

0.6% 

21G 

1LSTS 

ofntiao 

200 

0.016 

0.711 

971 

XOhJM  STUPE 

0:0001 

102 

H.030 

O.770 

1“  69 

CObM 

TESTS 

OlNJOl? 

150 

0.021 

0.  795 

2 s: 

F1«P1 

*DwT0 

000011 

12G 

0.021 

it  BIS 

- 19 

COUNT 

* 0“  Tit 

0:0111 

110 

0019 

o.flts 

S3  SI 

count 

VO0T0  STORE 

TESTS 

00*1101 

109 

0.01B 

0-  05"* 

1.70 

COUNT 

STUPE 

o:usoo 

100 

0 OlB 

0.070 

12.00 

ST  OPE 

Hi  TOP 

TESTS 

0001:0 

B1 

0.011 

0 Bfll 

-.Oil 

(JUMP 

anon 

0:1100 

60 

0.010 

0.093 

9.F0 

0TTOR  LOGIC 

TESTS 

0:2101 

55 

0 . 1'** 19 

0. 9**2 

6 21 

COUNT 

STORE 

SHIFT  TESTS 

101000 

39 

0 t'H'6 

0.9«*9 

M»S 

LOGIC 

0:0101 

37 

0.  (Mb 

0.915 

1 . 0*t 

COUNT 

STORE 

TES1S 

0001  10 

36 

0 f'l *6 

0.921 

li.11 

xOhTH  stopc 

001 oon 

22 

0.OO1 

0.921 

60  . "3 

ST»Ct 

0201O1 

20 

O.0O3 

0 920 

26  *10 

FLOmT 

5TUPE 

TESTS 

021101 

lfi 

0 003 

0 9?  1 

lSiti* 

COUNT 

BYTOP  LOGIC 

TESTS 

0OO1 1 1 

IB 

0 . 003 

0.951 

1 . “0 

COUNT 

<OhTA 

BYTDP 

021000 

16 

0.003 

O 936 

7.91 

LOGIC 

TESTS 

020100 

16 

O.O03 

0.939 

5 75 

0YTOP 

TFSTS 

OrOOOJ 

15 

0.002 

0911 

16  60 

com 

TESTS 

020100 

15 

0 . 002 

0-911 

29  10 

STUPE 

TFSTS 

Mtnt)2Q 

13 

0.002 

0916 

2.1G 

■uuno 

O0»)50O 

13 

0 002 

O.910 

2“  bi 

store 

81  TOP 

021012 

12 

0.0O2 

0 95“ 

li.no 

t i*pi 

XirthE 

LOGIC 

TESTS 

023101 

12 

0.002 

o 952 

79.01 

PIOmI 

STUPE 

LOGIC 

Sl.TFT  TESTS 

nnn  * 03 

12 

0.OH2 

0.951 

22  5“ 

COf  u 

STORE 

022103 

12 

0 002 

0.  956 

650 

CO*  I * 

5T0PE 

SHIFT  1FST5 

002101 

12 

0.002 

0.9SB 

19.31 

110*11 

STUPE 

SHlf  1 

0OO2O5 

12 

0.002 

0.960 

10  no 

cor  1.0 

MmL  f W 

020151 

12 

0.002 

0.962 

57.12 

COUNT 

<lhUn  STuPE 

TESTS 

ft*  *01 02 

11 

0 002 

0.9G3 

2.00 

ri>PT 

STORE 

020002 

9 

0.0*0 

O 96S 

M 11 

F 1*RT 

TESTS 

023100 

B 

0 »>‘l 

0.96b 

27.iH« 

STORE 

LOGIC 

SHIFT  TESTS 

020010 

8 

O.Col 

0-%7 

7.25 

XOmTh 

TESTS 

0011OO 

B 

O.ooi 

0.961 

30  00 

STORE 

ST  PCI- 

000503 

7 

0.001 

0.970 

0.  *MI 

cot  i* 

m tw  0f  top 

OlOOOO 

7 

O.O01 

0 9’1 

2.00 

mO*'  s 

O103OO 

7 

0.001 

0 97“ 

7.00 

ST  OPE 

HhLPW 

mDO»S 

000100 

7 

0.001 

0.9  ’3 

9*1.0“ 

01  TOP 

021300 

7 

0 001 

0-9’l 

93  0.* 

STORE 

HiilFu  LOGir 

TESTS 

020101 

7 

0.001 

0 976 

B.1'0 

COUNT 

BT  TOP 

TESTS 

001)300 

7 

O.CWil 

0 977 

23.06 

STORE 

H*»i  r u 

002610 

7 

0.001 

0 . 9 70 

18  11 

»T)iITm 

;tHLPW  BHOP 

SHIFT 

021170 

7 

0.001 

0 979 

P.6  29 

*OJln  STORE 

LOGIC 

TESTS 

01*3001 

i 

0.001 

0.900 

3- 00 

COUNT 

HOOPS 

OHO0O2 

7 

0.001 

0.9H1 

7.13 

FlaP! 

02Z021 

6 

0.001 

0.90' 

13.00 

COUNT 

*jurp 

SMUT  TESTS 

320 3S0 

6 

o.ooi 

0.903 

76  fa) 

* 1 iiO*‘*  5 TOPE 

MHiru 

TESTS 

021500 

6 

0.001 

0.90* 

11  00 

STORE 

0fTOP  LOGIC 

TESTS 

0*11203 

6 

0.0111 

0.905 

39  *fi 

con* 

»*Mt  f w ior,:c 

101200 

6 

0.001 

0 906 

5.00 

ttMitw  1 UG1C 

000112 

6 

0 0.11 

0.907 

52  00 

P 1 *PT 

tO*»T.i  STORE 

220OO1 

6 

0 001 

0-900 

8 0.1 

COUNT 

TESTS 

02212I 

6 

0.001 

0.909 

62  fi.i 

COUNT 

»JUHP  STORE 

SHIFT  TESTS 

023107 

6 

0.001 

0.99O 

17.no 

CPXPL 

SI  OFF 

LOGIC 

SHIFT  TESTS 

000130 

6 

0.001 

0 991 

1.0O 

*Omjh 

8Y10R 

02 i 100 

6 

0.001 

0 997 

6 00 

STORE 

LOGIC 

TESTS 

EXE  CT 


0YTPT  1N0PK 


E*ECT 


1NOPK 


Output  from  rcstttcr  cti^uticili»n  promrom 


OOIOIO 

6 

0.001 

0.993 

11  17 

XOOTft 

STACK 

021220 

6 

0.001 

0991 

11.17 

XJUMP 

Htt.ru 

LOGIC 

TESIS 

023106 

6 

0.001 

0.995 

11  00 

f«flo 

STOPE 

LX1C 

SHIFT 

TESIS 

oral 02 

5 

0 001 

0.996 

29.00 

FI>PT 

sioee 

LOGIC 

SHIFT 

TESTS 

000001 

3 

0 000 

0 99G 

10.6? 

COUNT 

001001 

2 

0,000 

0 997 

3.00 

COUNT 

Sack 

02:011 

2 

(1  non 

0.997 

03.  SO 

COUNT 

• DmTO 

SHIFT 

TESTS 

OOOlll 

2 

0.000 

0 997 

H 50 

COUNT 

*DHTA 

STOPE 

0000 30 

2 

0.000 

0 990 

6 00 

*oojn 

orooia 

2 

0.000 

0. 99P 

6 00 

COFIX 

TESTS 

00:600 

1 

0.006 

0 990 

31  00 

Htt.ru  BY  TOP 

SHIFT 

000070 

1 

0.000 

0.990 

15.00 

<OJin 

000200 

1 

0.0OO 

0 99B 

2B  00 

Htt.ru 

001110 

1 

0 000 

0 999 

11  00 

*D«Tm 

S1CP€ 

STACK 

OOO703 

1 

ewo 

0 999 

13  OO 

COT  l« 

Sinot  Htt.ru  BYTOP 

002101 

1 

0.  000 

0 999 

500 

COUNT 

STOPL 

SHIFT 

0OOOS0 

1 

0 000 

0.999 

6.00 

xlrtOH 

0:0501 

1 

0 000 

0.999 

S7.00 

COUNT 

STOPE  BYTOP 

TESIS 

000151 

1 

0 000 

0.999 

16.00 

COUNT 

ninoo 

STOPE 

0:0011 

1 

0 000 

1 000 

1:  00 

COUNT 

*DhTh 

TESTS 

010200 

1 

0.000 

1.000 

9.00 

Htt.ru 

AOOPS 

EXEET 

100000 

1 

0.000 

1000 

19.00 

100600 

1 

0.000 

1 000 

12  00 

Htt.ru  BYTOP 

EXEET 

6133  LIFETIMES.  66  OlFFE'ENl  CLASSES. 


UNION  Cl  ASS  ANO  ITS  COMPLEMENT 

73????  CFXFL  XOJIM  SIOPE  HALEH  BY10P  LOGIC  SHIFT  STACK  AOOPS  TESIS  BY1PT  INOPk  CXCCT 

Bteeoe  non it 


CLASSES  USEO  FOP  INOEKING 


MASK  '.NO  ITS  COMPLEMENT 


00.00  0 

XOJIM 

77*70? 

c Fxri 

STGPE  HllLFH  BYTOP  LDGIC  SHIFT  SIHCK  AOOPS 

TESTS 

rpnc- 

cunut . 

tt.'PGE 

CLASS 

COUNT 

T ION 

rpwci. 

LENGTH 

1NTEPPPE  TAT  ION 

000010 

55 1 

0.090 

0.090 

3.26 

aDmTh 

000130 

2B0 

0.O1E 

0.135 

9.21 

>OhJH  STOPE 

0OOO12 

150 

0.0:i 

0 160 

2 52 

ri*PT 

XDhTA 

000011 

126 

0 021 

0. 100 

1.19 

COUNT 

kDmTO 

0201 1 1 

11B 

0.019 

0.200 

53.51 

COUNT 

X0«T».  STOPE 

TESTS 

O00120 

B1 

0011 

0. : 1 3 

2 00 

XJUMP  BY  TOP 

000110 

36 

0.006 

0.219 

11.11 

XQOTh  STOPE 

000111 

IB 

0.003 

0.222 

1.00 

COUNT 

XDOTO  BY  TOP 

000020 

13 

0.002 

0.221 

2.16 

4JUMP 

021012 

12 

0.002 

0.226 

11.00 

ri>PT 

X1MME  LOGIC 

TESTS 

O20151 

12 

0.002 

0.22B 

52.12 

COUNT 

XlflOO  STOPE 

TESTS 

0:0010 

8 

0.001 

0.230 

7.25 

KO»(TA 

TESTS 

002610 

7 

0.001 

0 231 

IB  H 

<OOTO  Htt.ru  BYTOP 

SHIM 

021170 

7 

0.001 

0.232 

66  29 

XOJIM  STOPE  LOGIC 

TESTS 

022O21 

6 

0.001 

0.  233 

1300 

COUNT 

xjunp 

SMiri 

TESTS 

320350 

6 

0.001 

0.231 

76.00 

ximdo  stqpe  Htt.ru 

TESTS 

00O112 

6 

0.001 

0.235 

52.00 

mxpt 

XDOTh  STOPE 

02.' 121 

6 

0.001 

0.236 

62.00 

COUNT 

XJUMP  STOPE 

SHin 

TESTS 

000130 

6 

0.001 

0.237 

1.00 

XDhJM  bytop 

001010 

6 

0.001 

0.230 

11.17 

<DhTm 

STACK 

021220 

6 

0.001 

0.239 

11.17 

XJuMP  Htt.ru  LOGIC 

TESTS 

022011 

2 

0.300 

0.239 

63.50 

COUNT 

XDmTm 

SHirT 

TESTS 

0001 1 1 

2 

0.000 

0.239 

11.50 

COUNT 

xDhTh  STOPE 

000030 

2 

0.0tt) 

0.210 

6.00 

XDHJM 

02OOT3 

2 

0.000 

0.210 

6 00 

cor  ix 

XDOTH 

TESTS 

000070 

1 

0.000 

0.210 

15- tt* 

XDJ1M 

001110 

1 

0.000 

0.21C 

1«  00 

XDhTh  STOPE 

STACK 

0OOO5O 

1 

0.000 

0-211 

6-0*. 

XIHOH 

000151 

1 

0.000 

0.211 

16-00 

CuJNT 

ximdo  STOPE 

02001 1 

1 

0.000 

0.211 

12.00 

rOUNT 

XDHTH 

TESTS 

H7?  LIFETIMES.  30  01'  EPENT  CLASSES. 

UNION  CLASS  ANO 

1 ITS  COMPLEMENT 

377773 

tsooet 


COF  XOJIM  STOPS  MAEFU  BYTOP  LOGIC  SHIFT  S1ACK 
FLOA 


BYTPT  INDPk 


AOOPS 


TESTS  BYTPT  INOPK 

HON1T  EXECT 


Output  from  r«».»l«r  clarification  proya* 


C-3 


THE  ARITHMETIC  CLASSES 


CLASS.  NO  ARITHMETIC 


MASK  AHO  ITS  CCmEMENT 

CFNFL 

777779  XOJIM  STORE  HALFU  BYTOP  LOGIC  SHIFT  STACK  AOORS  TESTS  MONI1  BTTPT  1NORK 


FRAC- 

CUMUL. 

AVPGE 

CLASS 

COUNT 

TION 

FPHCT  1 

LENGTH 

INTERPRETATION 

999999 

61B 

0.101 

e 101 

131 

900018 

sr.i 

0 . 099 

0 191 

3. 76 

XUATA 

900100 

SIS 

9 , 989 

O 27 9 

7 9B 

STORE 

920099 

ia? 

0 979 

0 3S9 

7 46 

TESTS 

090130 

280 

0916 

0 405 

971 

XOAJM  STORE 

029S90 

198 

0 OlB 

0 4:2 

17  00 

STORE 

BTTOP 

TESTS 

009120 

81 

0.011 

0 436 

Z 00 

XJUMP 

BTTOP 

921100 

60 

9.910 

0 446 

9.60 

BTTOP  LOGIC 

TESTS 

191000 

39 

0 996 

0 452 

4 OS 

LOGIC 

000119 

36 

0.0T6 

0 450 

14  44 

XOATA  STORE 

901999 

22 

0.001 

0.461 

60  73 

STACK 

021099 

16 

9.903 

e 464 

7 94 

LOGIC 

TESTS 

929190 

16 

0.903 

0.46? 

5 75 

BY  TOP 

TESTS 

029190 

IS 

0.002 

O 469 

79  40 

STORE 

TESTS 

090029 

13 

0.092 

0.4?1 

2.46 

XJUMP 

9OOS00 

13 

0.002 

0. 4',3 

70  54 

STOPr 

3YT0P 

023100 

8 

O.0O1 

0.47S 

77  CO 

STP”l 

LOGIC  SHIFT 

TESTS 

020010 

B 

0 091 

0.476 

7.25 

XOATA 

TESTS 

901100 

B 

9.001 

0.477 

30.00 

STORE 

STACK 

910000 

7 

0 0O1 

0 .470 

2 00 

mDDRS 

010399 

7 

0.901 

0 400 

7.00 

5I0PE  HhLFU 

AOORS 

990109 

7 

0.901 

0 401 

90.0O 

8YT0P 

021399 

7 

0.091 

0 407 

93  OO 

STOPt 

MALTA  LOGIC 

IESTS 

009300 

7 

0.001 

0 403 

73  06 

S10PE 

HALFU 

002610 

7 

0.001 

0.404 

10  14 

XOATA 

HALFU  BYTOP  SHIFT 

021170 

7 

0.001 

0.  405 

66  79 

XOJIM  SI  . 

LOGIC 

TESTS 

3293S0 

6 

0.901 

0 406 

76  00 

HMOA  STL/E 

HALFU 

TESTS 

O21S09 

6 

0.001 

0.407 

14.0O 

SIORE 

BYTOP  LOGIC 

TESTS 

101200 

6 

0.091 

04BB 

S 00 

HALFU  LOGIC 

000130 

6 

9.091 

0 409 

4 on 

xOajm 

BYTOP 

021109 

6 

0.901 

0 49* 

6 00 

STOPE 

LOGIC 

TESTS 

091010 

6 

0.001 

0.491 

11  17 

XQtiTA 

STACK 

921220 

6 

0.091 

0.497 

14.17 

XJUMP 

HALFU  LOGIC 

TESTS 

909039 

2 

0.900 

0.49: 

6 00 

XOlUM 

902690 

1 

0.090 

0.493 

31  00 

HALFU  BTTOP  SHIFT 

900070 

1 

0.900 

0.493 

45  00 

XOJIM 

009200 

1 

0.009 

O 493 

70.00 

HaLFU 

001110 

1 

0.009 

0.493 

44.00 

KIWI  A STOPE 

S ACT 

90O0SO 

1 

0.900 

0 493 

600 

X1M0A 

010200 

1 

0.000 

0 493 

9.00 

HALFU 

AOORS 

100*300 

1 

0.909 

0 494 

19  00 

100609 

I 

0.009 

0 494 

12  00 

HALFU  BYTOP 

3928  LIFETIMES.  12  OirFEPENT  CLASSES. 


UNION  CLASS  ANO  ITS  COMPLEMENT 

737779  XOJIM  STORE  HALFU  BYTOP  LOGIC  SHIFT  STACK  AOORS  TESTS  BYTPT  1NDPK 

919097  CFKFL  M0N1T 


EXECT 


EXECT 


EXECT 


EXECT 

EXECT 


EXECT 


Output  (rom  register  clm.F.cet ion  pro9r 


CLASS)  FlXPOtNT  ADO  AN 0 SUBI0ACT 


HASF  ANO  ITS  COHPLCACNI 

600001 

777776 


CLASS 


070001 

00OO|  i 
0701 1 1 
000101 
077101 
070101 
07M01 
000411 
07O003 
0OO1O3 
077103 
OOO'es 
070IS1 
000603 
070401 
010031 
077071 
O017O3 
770001 
077171 
073I07 
WOOOl 
004001 
0770I 1 
000111 
07*>0 1 3 
000703 
007101 
07O5OI 
0O0IS1 
070011 


COUNT 

f*FLO  XOJIH  STOPC  HALFU  BYIOP  LOGIC  SHtFt  STACK  AOO0S  TESTS  nONIT  BYTPT  INO0K  EXEC1 


fphc- 

cim. 

HUPGC 

DUNt 

T10N  FPACI 

LCNGTH 

INTCPPPETAT10N 

1B7 

0.03* 

0.030 

10.59 

COUNT 

176 

0.021 

0.OS0 

4.49 

COUNT 

<D«TA 

118 

0 019 

0 069 

S3  51 

COUNT 

kOmT*  STOPC 

109 

0 010 

0.087 

4 70 

COUNT 

STOPC 

56 

0 0*9 

0 096 

6.21 

COUNT 

5 TOPE 

37 

0.006 

0 107 

4.00 

COUNT 

5T0PE 

IB 

0 0*3 

0 10S 

1S.04> 

COUNT 

18 

0.003 

O 108 

4.00 

COUNT 

xDOTA 

IS 

0 n*2 

0111 

16.60 

cor  1* 

1? 

0.002 

0113 

22  50 

Cor  1* 

STOPC 

17 

0.002 

0. 1 IS 

6.60 

COM* 

STOPC 

1? 

0.0*2 

0 117 

10.00 

COflO 

17 

0.002 

0 119 

S2  42 

COUNT 

xino*  STOPC 

0.001 

0 170 

0.00 

corn 

7 

0.001 

0.171 

0 . or 

COUNT 

7 

0.001 

0.177 

3.W 

COUNT 

6 

0.001 

0 173 

I3.no 

COUNT 

XJUflP 

6 

0.00! 

0. 174 

39.  no 

corn 

6 

0.001 

0 17S 

B.no 

COUNT 

6 

0.001 

0.176 

62  00 

COUNT 

xjunp  sTOPf 

6 

0.0*1 

0.17? 

17.00 

CFiFl 

STOPC 

3 

0.0>*0 

0 177 

10.67 

COUNT 

7 

0.000 

0178 

3.00 

COUNT 

7 

0.000 

0.170 

03.50 

COUNT 

>D*Th 

2 

0.000 

0.178 

14.50 

COUNT 

<D*Trt  STOPC 

2 

0 . (tfHl 

0.179 

6.00 

corn 

xOhTh 

1 

0.000 

0.179 

13.00 

corn 

STOPC 

1 

0 . OHO 

0.179 

500 

COUNT 

STOPC 

1 

0.000 

0.179 

57.00 

COUNT 

STOPC 

1 

0.000 

0.179 

16.00 

COUNT 

*IrtOH  STOPC 

1 

0.000 

0.I79 

42.00 

COUNT 

VOnTA 

SHIFT 


BY  TOP  LOGIC 
BY10P 


HALFU 


HALFU  BYTOP 
BY10P 


SH1F1 


HALFU 


SMIF1 


AOOPS 


LOGIC 


SHI  F 1 
LOGIC  SHIFT 


StACK 


BYTOP 


SHIF1 


SHIFT 


TCSIS 

TCSTS 

TCSTS 

tests 

TES1S 

tests 

TCSTS 

TESTS 

TCSTS 

tests 

tests 

tests 

Tests 

Tests 

tests 

TCSTS 

TCSTS 
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UNION  CLASS  ANO  ITS  COMPLEMENT 

737777 

540000 


CFXFL  XDJIM  StO»E  HALFU  BY10T  LOGIC  SHlFI  STACK  AOOPS  TESTS 


1N0PK 


MONIt  BYtPT 


INDPK 


CKECT 


CLASS. 

FULL  F1XP0IN1 

AP1IHMEI1C 

• • • «t 

»•••••• 

»*  * » 

• ••  1 

1 M 1 II  1 | 

hast  ano 

ITS  COMPLCHENT 

enono2 

F 1XPT 

77777s 

COfLO 

*0J10  STOPC 

HALFU  0YTOP  LOGIC 

SHIFT  STOCK 

HOOPS  TCSTS 

rpnc 

cim. 

O'JPGC 

CLASS 

COUNT 

I ION 

FPhCT . 

ICNCTH 

1NTCPPPCTAT10N 

000012 

ISO 

0.071 

0.024 

2.S7 

FlrPT 

0200*3 

IS 

0.002 

0.027 

16  60 

cor  n 

TCSTS 

07101.? 

17 

0.007 

0.029 

11.00 

F |XPT 

xinnc 

LOGIC 

TCSTS 

000103 

12 

0.002 

0 031 

77  SO 

COM* 

STOPC 

022103 

17 

n.ntc 

0.033 

6 5* 

con* 

ST  CPC 

SHIFT 

TESTS 

non  1 02 

11 

0.0*2 

0.036 

2. 00 

ri*PT 

STOPC 

02*0*2 

9 

**l 

*.*36 

14.11 

F 1 »PT 

TCSTS 

00*603 

7 

O.uOl 

0.037 

0.0* 

corn 

HALFU  0YIOP 

0000*2 

7 

0**1 

*030 

2.43 

FI  »PT 

00 i 703 

6 

0.**| 

0.019 

39.00 

corn 

MmLFM 

LOGIC 

000117 

6 

0 001 

0.04* 

57  00 

rnPT 

*DmTm  STOPC 

07310? 

6 

0.0*1 

0 . 04  i 

17.0* 

crxn 

STOPC 

LOGIC 

SHIFI 

TCSTS 

073106 

6 

O.*0l 

0.047 

14.00 

rxrio 

STOP! 

LOGIC 

SHlFI 

TCSTS 

0231*2 

S 

O.001 

0.043 

29-00 

F 1 HPT 

STOPC 

LOGIC 

SHIFT 

TCSTS 

O70O13 

2 

0.000 

0.043 

s.w 

corn 

TCSTS 

00*703 

1 

0.0*0 

0.044 

13-00 

C0F1« 

STOPC 

HALFU  B< 100 
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UNION  CLASS  ANO  ITS  COMPLEMENT 

0737S7 

7S4O70 


CFKFe  UMOm  STOPC  HmLFU  BttOP  LOGIC  SHIM 
KJUMP 


STACK  AOOPS 


tests 


MONIt  BYTP1  INOPK  ExCCT 


Output  from  risutir  cl  •••!  f icpt  i»n  pro  or  on 
CL  ASS  i FLOATING  ARITHMETIC  


HAST  AND  US  COMPLEMENT 


000004 

FLOAT 

777773 

COFIX  X0J1M  STORE  HALFU  0YTOP  LOGIC  SHIFT  STACK 

HOOPS  TESTS 

FRAC- 

CUMUL. 

CLASS 

COUNT 

TION 

TRACT.  LENGTH 

INTERPRETATION 

000104 

1349 

0.22O 

0.2:0 

s.ss 

FLOAI 

STORE 

000004 

712 

0.116 

0.336 

3.63 

FLOAT 

020104 

20 

0.003 

0.339 

26.00 

FLOAT 

S10RE 

TESTS 

023104 

12 

0.002 

0 341 

79  03 

FLOAT 

ST  OPE 

LXIC  SHIFT 

TESTS 

002104 

12 

0.002 

0 343 

19.33 

FLOAT 

STORE 

SHIFT 

0OO2O5 

12 

0.002 

0 34S 

10.00 

COFLO 

HALFU 

0231C’ 

6 

0.001 

0 346 

17.00 

CF  XFL 

STORE 

LOGIC  SHIFT 

TESTS 

0231(6 

6 

0.001 

0 347 

14.00 

FXFLO 

STORE 

LOGIC  SHIFT 

TESTS 

2129  L If  El  IOCS 

. 0 DIFFERENT 

CLASSES. 

UNION  CLASS  ANO 

I ITS  COMPLEMENT 

023307 

CFXFL 

STORE  HALFU 

LXIC  SHIFT 

TESTS 

7S4470 


xojim 


OYTOP 


STACK  AOOPS 


MON1T  0YTPT  1NORK  EXECT 


CUMULATIVE  STATISTICS  TOR  THE  PHYSICAL  REGISTERS 


REG 

LIVES 

total 

LIVE 

TOTAL 

USES 

FRACTION 

LIVE 

00 

002 

S299. 

3004. 

0.2S4 

01 

203 

6630. 

1634. 

0.310 

02 

2217 

I1S90. 

7940. 

0.S56 

03 

1360 

S092. 

4199. 

0.244 

04 

290 

1262. 

703. 

0.O6O 

0S 

21 

40S0. 

231. 

0.233 

06 

40 

664. 

72. 

0.033 

07 

0 

S0S6. 

369. 

0.242 

10 

0 

0. 

0. 

0.000 

11 

a 

S69I. 

517. 

0.273 

12 

0 

0. 

P. 

0.000 

13 

12 

3462. 

340. 

0.166 

14 

12 

1044. 

40. 

O.OSO 

IS 

316 

11376. 

S122. 

0.S4S 

16 

626 

9346. 

2705. 

0 440 

17 

34 

7427. 

1160. 

0. 356 

sun  or  averages i 

6133  3779 


AVRG. 

USES  PR. 

USES  PR. 

USES  PR. 

LENGTH 

LIFE 

LIVE  INSTR 

I01AL  INSTR 

6.01 

3. SO 

0.S6 

0 IS 

23  43 

S 77 

H.2S 

0 00 

S.23 

3. SO 

0 60 

0 36 

3.72 

3.07 

0.02 

0.20 

4.23 

2.63 

0.62 

0.04 

23133 

11.00 

O.OS 

0.01 

14. 2S 

l.SO 

0.11 

0.00 

632. 2S 

4612 

0.07 

0.02 

0.00 

0.00 

0.00 

000 

711.37 

64.62 

0.09 

0.02 

0.00 

0.00 

0 00 

0.00 

200. SO 

23  00 

0.10 

0.02 

07.00 

4.00 

00s 

0.00 

36.00 

16. 21 

0-  45 

0.2S 

14.93 

4.32 

0.29 

0.13 

21044 

34. 3S 

0.16 

0.06 

12. 0S 

4.60 

0.36 

0.09 

I 3S 


UNION  or  USAGE  CLASSES  fOR  THE  PHYSICAL  REGISTERS: 


00 

637707 

CFXFL 

STOPS  HALFU  0YTDR  LXIC  SHIFT 

STACK  AOOPS  TESTS 

01 

0337S7 

CFXFL 

XIHOH  STORE  HALFU  0YTOP  LXIC  SHIFT 

AOORS  TESTS 

02 

023S37 

CF  XFL 

XDAJFT  ST OPE 

0YTOR  LXIC  SHIFT 

TESTS 

03 

023S37 

CFXFL 

XOAJH  STORE 

OYTOP  LXIC  SHIFT 

TESTS 

04 

021SS7 

CFXFL 

XIMOA  STORE 

0YTOR  LXIC 

TESTS 

0S 

021310 

XDATR  STORE  HALFU 

LXIC 

TESTS 

06 

020100 

STORE 

TESTS 

07 

0O2610 

XQHTrt  rn.ru  0Y1OP  SHIFT 

10 

000000 

11 

621320 

XJUMP  STORE  HALFU 

LXIC 

TESTS 

12 

000000 

13 

3203S0 

XIMOA  STORE  HALFU 

TESTS 

14 

020001 

COUNT 

TESTS 

IS 

02O3S1 

COUNT 

XIMOA  STORE  HALFU 

TESTS 

16 

021170 

XOJIN  STORE 

LXIC 

TESTS 

17 

014110 

XDATA  STORE 

STACK  AOORS 

1NORK  EXECT 


0YTPT  1NORK 


crxrc  xojin  store  halfw  oytop  logic  shift  stack  aoops  tests 


0YTPT  INORK  EXECT 


UNION  or  CLASSES  ANO  COMPLEMENT 

737777 

040000 


MONIT 


0- 1 


(fltHOII  0 
Th#  total  SHIFT 


I0TAL  C*ECUI£0  INSTRUCTIONS  nNO  line.  ltt...W0S  3:ilB89S9  USEC 
274  0|FF£P£Nt  INSTRUCTIONS  USEO 


THE  SHIFT  OPOEREO  BY  NUHEP1C  OPCODE 

with  instruction  count  hno  couputeo  nnc 


ee 

000 

• 

0 

0 00 

001 

• 

0 

o.oo 

002 

• 

O.no 

»103 

• 

O.OO 

nOI 

f 

3«'B 

O.0O 

nos 

• 

1868 

0 on 

0*)6 

• 

172 

0.00 

007 

■ 

119 
0 00 

01 

eie 

• 

S6 

0.00 

on 

• 

09 

0 00 

012 

• 

31 

o.oo 

013 

• 

0 

0 00 

OH 

• 

0 

o.oo 

01S 

• 

1 

0.0(1 

016 

t 

2 

0.00 

017 

• 

11 

0.00 

02 

020 

• 

3 

0 OP 

021 

• 

1 

0.00 

022 

• 

0 

O.OO 

023 

• 

0 

0.0(1 

021 

• 

0 

0.00 

02S 

• 

0 

o.oo 

026 

• 

0 

0.00 

077 

• 

0 

O.OO 

03 

030 

• 

0 

0 00 

031 

• 

0 

0 on 

032 

• 

0 

0.00 

033 

• 

0 

0 00 

031 

• 

0 

0.00 

03S 

• 

2 

0.00 

036 

• 

0 

0.00 

03? 

• 

0 

0 • OO 

04 

010 

i 

6 

0.00 

Oil 

• 

i 

0.00 

012 

• 

0 

0.00 

013 

• 

0 

0 00 

011 

• 

0 

0.0(1 

01S 

• 

0 

0.00 

016 

• 

0 

0.00 

017 

• 

137 

0.00 

os 

0S0 

• 

.4 

0.00 

0S1 

• 

S76 

0.00 

0S2 

• 

0 

0.00 

0S3 

• 

0 

0.00 

oSi 

• 

(1 

0.00 

nss 

• 

0 

o.oo 

0S6 

• 

0.00 

067 

• 

1 

0.00 

06 

060 

i 

0 

0.00 

061 

• 

3 

8.00 

062 

• 

0.00 

063 

• 

3 

0.00 

C»61 

• 

3 

0 on 

06S 

• 

3 

0 00 

066 

• 

5 

0.00 

067 

• 

10 

0.00 

07 

070 

■ 

IS 

0.00 

071 

• 

H 
e on 

022 

• 

n 

0.00 

073 

• 

0 

o.oo 

071 

• 

0 

000 

07S 

• 

0 

0.00 

A76 

• 

2 

0.00 

077 

• 

0 

0.00 

10 

I0O 

• 

0 

0 00 

101 

• 

0 

0.00 

102 

• 

0 

0.(11 

103 

• 

0 

0.00 

101 

• 

0 

0.00 

106 

• 

0 

0.00 

106 

• 

0 

O.OO 

107 

• 

0 

O.OO 

11 

110 

• 

0 

0 00 

111 

• 

0 

o on 

112 

• 

0 

0.00 

113 

• 

0 

O 00 

111 

t 

0 

0.0(1 

116 

• 

0 

0.00 

116 

• 

n 

6.00 

117 

• 

O 

0.00 

12 

120 

■ 

0 

0.00 

121 

• 

0 

non 

127 

• 

0 

0.00 

123 

• 

D 

0.00 

121 

• 

0 

0.00 

176 

• 

0 

0.00 

126 

• 

0 

8 no 

127 

« 

0 

0.00 

13 

UFA 

■ 

2SSB 

12764.42 

OFN 

• 

3 

10.62 

FSC  7886 

• BS64I  96 

IBP 

• 

211 
730. 23 

1108 

• 

3601 

2861997 

LOB  «i: 

■ 47S2IBW 

IOPB  1914 

• I69SB.P4 

opb 

• 

821 
6978  60 

14 

FAD 

• 

10796 

S4643.6B 

FAOl 

• 

2266 
13SSS  98 

Foon 

• 

S16 

330330 

FhDB 

• 

86 
520  30 

F HOP 

• 

11353 
6190?  38 

FAOPI 

• 

12S0 

S737.SO 

FADPfi 

• 2 

3982 
sem . 76 

FhOPO 

• 

2722 

17502.16 

IS 

F 60 

i 

28  7 
IS09  62 

FSB  1 

• 

0 

0.00 

FSBH 

• 

0 

0 00 

FSOfl 

• 

0 

0 00 

FSOP 

• 

1?B?6 

7262061 

fsbpi 

« 

16? 
2203.  71 

FSBPn 

236 

15S9.96 

F6QP8 

• 

0 

0.00 

16 

f up 

■ 

4173 

43BSB23 

friPi 

• 

272 
3089- 97 

FMPfl 

• 

0 

0.00 

fnPB 

• 

0 

0.00 

rrw»P  19386 

• 2130S7.H 

FttPPl 

• 

1113 
10229  BS 

Fnppn 

• 

166 

188968 

FttPPB 

• 

S12 
6123. S2 

17 

rov 

• 

SO  34 
71906  20 

FOVl 

• 

1 

IS.  80 

Fovn 

• 

B6 

131S  80 

FD’-B 

• 

0 

O 00 

FOOP 

■ 

S533 

7912 1.90 

FDVP1 

• 

321 
1301  10 

FDVPH 

• 

s 

76  60 

F0UP8 

• 

33 

SOI . 90 

20 

MOVE  191709 
■ 16601727 

novel 

• 

360  7S 
S3O30.7S 

nOVEn  77293 
• 186S1S.91 

WMS 

• 

0 

n oo 

no*/s 

# 

919 
7306- 07 

noosi 

• 

329»i 
1830  3o 

novsn 

• 

6S6 
1131  18 

nooss 

• 

12 

31.11 

21 

novN 

• 

9*9? 
13303.  17 

novNi 

• 

2013 
3321  IS 

novNn 

t 

1130 

3110.08 

nows 

• 

107 

1211. 3S 

noon 

• 

7S19 
6571. S9 

novni 

• 

0 

O.nn 

novnn 

• 

0 

0.0*1 

nows 

• 

3 IS 
960. 7S 

22 

inui 

■ 

6S13 
63692- S3 

inuii 

• 

3903 

32660.60 

man 

• 

720 
7371  60 

inao 

• 

2S 
269  SO 

nm 

• 

117 

126S.91 

nuu 

• 

17S 

1S20.7S 

nut  a 

• 

0 

(1.0(1 

nut  8 

• 

0 

0.00 

23 

101 V 

■ 

2182 

36139.10 

101 V71 

• 

7S01 

10779.80 

ioi  vn 

• 

0 

O.OO 

ID!  VQ 

• 

0 

0.00 

010 

• 

0 

O.DO 

0101 

• 

0 

0.00 

oi  on 

• 

0 

0 oo 

0108 

• 

0 

0 00 

24 

ASM 

• 

1S230 

36SS2.00 

POT 

■ 

331S 
8078  00 

L5H 

• 

7630 

18317.110 

JFFD 

t 

1208 

1711.20 

ASIC 

• 

7O70 
99S6  70 

PDTC 

• 

17S2 

8127.12 

LSHC 

• 

1111 

6313-91 

NULL 

• 

0 

0.00 

2S 

EXCH 

■ 

1737 

S228-37 

ai 

• 

S70 

2617S-20 

A08JP 

• 

1777 

72BS.83 

HO0JN 

• 

709 

1269.1! 

JPS1 

• 

7O10O 
103161  60 

JFCL 

• 

2390 
3613  30 

KCT 

• 

6168 

7S96.96 

NULL 

• 

0 

o.oo 

26 

PUSHJ  IB2CS 
• SG004.1S 

PUSH 

• 1 

30236 
23060. S2 

POP 

• 

13961 

S79O9.10 

POPJ 

• 1 

21OS0 

66939.O0 

JSP 

• 

37S1 

9070.79 

JSP 

• 

1759 
6996  73 

J6A 

• 

7817 

8739.16 

JPA 

• 

2836 

B9US.01 

27 

AGO 

■ 

79290 
2 1 001 7 . SO 

hOOI 

• 

11391 
2039S  26 

noon 

• 

1180 
3761  20 

AD06 

• 

28? 

91SS3 

sun 

• 

11316 

31701.00 

sue  i 

• 

131? 

7763.23 

suon 

• 

11 

36  09 

SUOB 

• 

0 

0.00 

30 

CAI 

• 

71 

132.16 

CAII 

• 

726 
I 799. SI 

CA1E 

• 

1216 

761X1.31 

caile 

• 

2B7S 

S0S6.7S 

CAI  A 

t 

1S01 
7692  16 

CAIGE 

• 

1812 

3213.18 

CAIN 

• 1 

7186 
12867  91 

ChIG 

• 

3706 

6633.71 

31 

CAn 

■ 

0 

0.00 

CA«t 

• 

6889 
I 8911. 7S 

Came 

• 

1627 
17721. 2S 

cona 

• 

11166 
39781  SO 

ChTIm 

• 

0 

0.0V- 

C«nf,E 

• 

11710 

3O91O.0O 

ChTIN 

• 

1616 
1169  00 

Cone 

• 

S783 
16903. 2S 

Th.  total  SHIFT 
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32 

juhp 

a 

H 
25  06 

Jim  302  2 

• 6811  30 

jtm  sole 
• 10414  22 

jtmt  ii ’8 
■ 7178  6; 

juupa 

a 

0 

000 

juupge 

a 

1431 

2561.19 

JUMPN 

a 

3703 

6628.37 

juhpg 

■ 
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2053.13 

33 

SKIP 

a 

O 

0.00 

SUPl 

a 

1222 
3109  12 

5*  IPS 

a 

2701 
*057  44 

St'IPLE  538 

• HOI  18 
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a 

1761 

1604.04 

5UPGE 

a 

3111 
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5*  1PN 

a 
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831285 

S*  IPG 

a 

2301 

E00S.6I 

at 

AOJ 

a 

2431 
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AOJL 

a 
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a 
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a 

:i 
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HOJCE 

a 
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1.79 
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a 

11 
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■ 
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35 
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AOSL 

a 
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O.0O 
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a 
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a 
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mOSCE 
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a 

6 
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a 
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a 
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a 
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a 
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a 
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a 
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3? 
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a 
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a 

S 
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a 

B 
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SOSLE 

• 

6I0 
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SOSA 

a 

4 
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505GE 

a 

44 
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505 H 

• 
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SOSG 

a 
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io 

SCT2 

• 
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a 
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5ET28  1760 
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a 
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a 
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a 
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• 
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60? 

ii 
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a 

0 
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a 

0 

0.00 

5ETM 

a 

0 

0.00 

SCTM1 

a 

0 
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a 

0 
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scTne 

a 

0 

0.00 

12 
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111.35 
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a 
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a 

0 

0.00 

MNOcne 

a 

0 

0.00 

5ETA 

a 

0 

0.00 

5ETA1 

a 

0 

0.00 

SETrtrt 

a 

0 

0.00 

SETAO 

a 

0 
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13 

XOP 

a 
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352-09 

XOPI 

a 

30 
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xOPfl 

a 

4 

12. 01 

XOP0 

a 

72 

216  72 

10R 

a 

702 

1801  14 

10PI 

a 

46 
74  06 

lopn 

a 

117 
352  1? 

1OP0 

a 

66 
190  66 

11 

AN0C8 

a 

3 

6.  76 

A HOC 01 

a 

0 

0 W 

AHOCBM 

a 

0 

0.00 

ANOC00 

• 

0 

0.00 

EyV 

a 

174 

117.10 

E0V1 

a 

0 

0.00 

EOVfl 

a 

0 

0.80 

EQVO 

a 

0 

0.00 

15 

5ETCA 

• 

3 

1.83 

5ETCA1 

a 

0 

000 

SETCArt 

a 

41 

105  70 

5CTC08 

a 

0 

000 

OPCA 

a 

20 

58  40 

0PCA1 

a 

0 

0.00 

OPCAn 

a 

0 

0.00 

0PCA8 

a 

0 

0 00 

IE 

SETCrt 

a 

270 
656. 10 

sc  Tcm 

a 

0 

0.80 

sncrvi 

a 

5 

14.35 

5ETCM0 

a 

1 

2.87 

open 

a 

173 
444  61 

OPCttl 

a 

7 

11.27 

QPCnn 

a 

0 

0.00 

0PCM8 

a 

0 

0.00 

17 

OPC0 

a 

0 

0.00 

OPC01 

a 

0 

0.00 

ORCBfl 

a 

0 

0.00 

OPC00 

a 

0 

0.00 

SETO 

a 

319 
468  93 

5ET01 

a 

14 

20.58 

5ET0H 

a 

305 

744.20 

5ET08 

a 

12 
29  20 

50 

MIL 

a 

730 

1096.66 

Hill 

a 

0 

0 00 

Hiin 

a 

33 
99  33 

HL15 

a 

0 

000 

HPL 

a 

747 

2181.24 

HPL  1 

a 

2181 

1274.76 

HPLn 

a 

581 
1740  81 

HPL  5 

a 

1? 

18.79 

51 

HLL2 

a 

1106 

3610.98 

mi?  I 

a 

0 

0 00 

HLL  2n 

a 

14 

36  12 

HLL25 

a 

20 
57.  V*. 

HPL  2 

a 

1491 

3630.42 

HPL?! 

a 

1279 

188013 

HPL2M 

a 

76 
196  08 

HPL  2 5 

a 

1 

11.18 

52 

MLLO 

a 

10 

21.30 

UOI 

a 

0 

0.00 

HLL  CXI 

a 

O 

000 

HLL05 

a 

•) 

0 00 

HPLO 

a 

0 

0.00 

HPL  01 

a 

96 

141.12 

hpl  on 

a 

O 

0 00 

HPL  OS 

a 

0 

0.00 

S3 

HUE 

a 

0 

0.00 

HllEI 

a 

0 

0.00 

HLLEfl 

a 

0 

0.00 

HLLE5 

a 

0 

0.00 

HPlE 

a 

0 

0.00 

HPLE  1 

a 

0 

O.O0 

HPLEn 

a 

0 

000 

HPLE  5 

• 

0 

0.00 

51 

MRP 

a 

719 

1921.93 

HPPI 

a 

1210 

1948.10 

HPRH 

• 1 

4190 

2635.98 

HPPS 

a 

0 

0.0*) 

HP 

a 

21 

61.32 

Ml  P 1 

■ 

0 

o.oo 

Ml  PM 

a 

11 

33.11 

HIPS 

■ 

32 

91.81 

55 

HPP2 

a 

12307 

29986.01 

HPP21 

a 

1084 

159348 

HPP2M 

a 

re: 

2017.56 

HPP25 

a 

36? 
1053. 29 

HIP2 

a 

9271 

22520.53 

HLP21 

a 

0 

000 

MI.P2M 

a 

20 

51.60 

HIP25 

■ 

6 

17.22 

56 

HPPQ 

a 

16 
38. 88 

HPP01 

a 

561 

829.08 

hprou 

a 

5 

12.90 

HPP05 

a 

3 

861 

HlRO 

a 

0 

0.00 

HLP01 

a 

O 

e.oo 

UPON 

a 

0 

0.00 

HIPOS 

a 

0 

0 00 

57 

HP  RE 

a 

23 

55.89 

HPPCI 

a 

13 

19.11 

HPPEfl 

a 

4 

10.32 

HPRE5 

a 

5 

11.35 

HI  PE 

• 

367 
891- 81 

HI  PEI 

a 

O 

0 00 

HiPEn 

a 

1 

2 50 

HI  PE  5 

a 

0 

0.00 

GO 

TPN 

a 

0 

0.08 

TIN 

a 

0 

0.00 

TPNE 

a 

1211 

243236 

TINE  5440 

• 1066210 

TPNH 

a 

0 

0-00 

TINA 

• 

0 

0.00 

TPNN 

a 

7142 
14586. 3? 

TLNN 

a 

2966 
5811  10 

El 

TON 

a 

0 

0.00 

TSN 

a 

0 

0.00 

TONE 

a 

129 
376  60 

TSNE 

a 

0 

e.oo 

TDNA 

a 

0 

0 OO 

TSNA 

a 

0 

0.00 

TONN 

a 

9 

26.28 

T5NN 

a 

0 

0.00 

E2 

TP? 

a 

288 

561.18 

T12 

a 

4344 

8514.24 

TP2E 

a 

1117 

2777.32 

IL2E 

a 

361*56 

TR2A 

a 

0 

0.00 

T12A 

a 

9 

17.64 

TP2N 

a 

59 

115.64 

T12N 

a 

217 
125- 32 

63 

TO? 

a 

9 

26.28 

TS? 

a 

0 

0.00 

T02E 

a 

0 

0.00 

T52E 

a 

0 

0.00 

T02A 

a 

BOO 
2569. 60 

T52A 

a 

0 

0.00 

T02N 

a 

0 

e.oo 

T52N 

a 

0 

0.00 

El 

TPC 

a 

20O 

518.00 

TIC 

a 

1530 

2990.00 

TPCE 

a 

0 

0.00 

TICE 

a 

6 

11.76 

TPCA 

a 

0b3 

TIC  A 

a 

0 

0.00 

TPCN 

a 

0 

0.00 

. TICN 

• 

IB 

91.00 

65 

TOC 

a 

0 

0.00 

T5C 

a 

97 
203  21 

TOCE 

a 

0 

0 00 

T5CE 

a 

0 

000 

TOCA 

a 

0 

0.00 

TSCA 

a 

8 

O.OO 

TDCN 

a 

8 

0.00 

T5CN 

a 

8 

8.80 

66 

TPO 

a 

28 

51.98 

TLO 

a 

720 

1411.20 

I POE 

a 

0 

0.00 

TlOE 

a 

37 

72.52 

TPOA 

a 

2 

3.92 

TLOA 

a 

22 

43.12 

TPON 

• 

0 

0.00 

TLON 

a 

13 

25.10 

67 

TOO 

a 

0 

0 00 

T50 

• 

5.81 

TDOE 

a 

0 

0.00 

TSQE 

a 

0 

o.oo 

TDOA 

• 

18 

52  56 

TSOA 

a 

0 

0.00 

TOON 

a 

0 

0.00 

TSON 

• 

0 

0.00 

rH 


The  total  SN1FT 


IH£  GIBSON  01STPI8UU0N 


CLASS 

COUNT 

FPhCT. 

10TOL  TIHC 

FKACT. 

1 

LOOST 

4:3/64 

0.4:30 

11474:6  4Z 

0.3SS7 

COhOS  ..NO  SlOPtS 

Fl**- 

174307 

0. 1744 

3.16604  1 4 

0. 1017 

FUfo  POINT  HOO  SUBIPhCT 

3 

COriPA 

4 

0PANC 

701014 

o :ai0 

609064  SI 

O. 1096 

BPnNCHt.5 

s 

flt*- 

49443 

0.0494 

7 '377.9  06 

0 001*7 

FI  Oh  (INC  BOO  S .B1PBCT 

6 

fl  nut 

2S64  4 

o «:s6 

770:43.34 

0 . 0066 

FLO..T INC  HU  TIPI  T 

7 

FLOW 

11013 

0 0110 

15737:  so 

0.0490 

FL0HI1NG  OIUIDt 

0 

Fumx. 

11033 

0.0110 

101900.97 

0.0310 

Fl«fO  MULTIPLY 

9 

FXQIV 

4763 

0.0*40 

77719  70 

0.0740 

Fi»fO  01U1DC 

10 

SHIFT 

39074 

0 0390 

177761  69 

0 »'S36 

SHIFTS 

11 

LOGIC 

9673 

0.0097 

70170  SS 

0.0063 

LOGIC 

1 z 

nisei 

1S3S1 

00IS4 

670 77. 61 

0 .0166 

MlSCfLLBNfDUS 

13 

I HOC* 

14 

fuluo 

IS 

1/0- 

707 

0.0007 

0.00 

0.0000 

I/O  INSTPUCT IONS 

16 

CPU.  . 

17 

nONIT 

143 

O-LVh'1 

0.4-0 

O.OiiOO 

r, ONI  TOP  ChUS 

10 

UUUO 

3ZS6 

0 0033 

0.00 

0 . OiXHj 

US£«  UUOS 

THt  PPOGPhU  STPUCTUP£  01STPIBUT10N 


CLASS 

COUNT 

FPhCT 

total  TIME 

FPHCT, 

1 

moi 

701009 

0.7»>11 

490471*.  s: 

0. 1677 

rove  mnoPT  10  nfx . 

: 

Ml  DO 

760 3S 

0 0760 

T9’0m.SS 

O 1*614 

note  nee.  id  ntNiPT 

3 

lflTOA 

41378 

0.0*1  i4 

61 180.0*3 

O.0191 

hU"t  inmniMii  id  >£c. 

4 

SC  T A 

4177 

0.0047 

613784 

OOO 19 

st  t ii  op  1 in  mcc. 

s 

SC  TO 

4S17 

0 . 0*4  4 S 

1I0CT.40 

O.O034 

St  I 0 OP  1 TO  Ml  HOP  t 

6 

PUTOA 

43973 

0.0439 

1S4  781.4S 

0.0407 

fiOOt  PhPTUOPO  TO  Htc. 

7 

►IT  OPU 

0460 

0 . 0(*OS 

4070  V 93 

0.0177 

MO'T  HCC.  TO  PHPTHOPD 

0 

0L»W 

S70 

0.0006 

7647S  70 

0.01*87 

Bioo  move 

9 

ST01T 

7790 

0.0073 

14407  76 

0.004S 

SCT  BITS 

10 

UNUSO 

11 

UNUSO 

17 

hSONC 

16S37 

0.016S 

44481.83 

0.0130 

BOO  OP  SUOTPmCT  CNf 

13 

FI*«- 

107B4S 

0. 1070 

787177.31 

o.on  0 

FI  ICO  MOO  SUBTPmCI 

14 

F I W 

IS  796 

0. 01 SR 

179700- 17 

0 06!, 0 

f ixo  multiply  mwiot 

IS 

FLOAT 

061  no 

0.0061 

709709  70 

0. 77*'0 

TLOwTINC  HPltMHllIC 

16 

SHIFT 

39074 

0.0.390 

17776169 

0.0636 

SHIFTS 

17 

LOGIC 

9673 

0-0097 

701 70. SS 

O.0*.|R3 

LOGIChL  OPIPmT IONS 

10 

UNUSO 

19 

UNUSO 

70 

10XFP 

674 

O.OHOG 

0.00 

0.0“*' 0 

1/0  TPmNSFIPS 

71 

IOmOO 

4S 

0 . (u  ll  H i 

0.00 

O.O.ii'O 

1 '0  mOMIMSIPhTION 

UUOTH 

143 

0.0001 

O.  00 

O.O.HM 

QIHtP  MOM  TOP  UUOS 

73 

UUUO 

37S6 

0.0033 

0.00 

O 00.10 

ustp  uuos 

74 

UNUSO 

?S 

UNUSO 

76 

UNUSO 

77 

SPJOP 

79*07 

0.0791 

01109. 33 

0.0763 

SUIPOllTINf  JUMPS 

78 

SPPET 

730BC 

0.0.39 

768 *4.04 

(».*V  30 

smpouuw  ptniPNS 

79 

STlPT 

4 4 19*' 

0.0447 

IB' i969. 67 

0.OS63 

SIlOPOINlfP  OPIPmIIONS 

30 

nUSlO 

COS  01 

0.07OS 

366%  - 9 

M.“l  14 

HST  >CC  ULPSUS  IMMIOlHlf 

3 ( 

mUSO 

70099 

0.0701 

35977  71 

O.o  17 

It  ST  hCC.  WPSU5  JtPO 

37 

mUSOC 

44S71 

0.044S 

177437. 7S 

0.0301 

TtST  HCC  VIPSUS  MtMOPT 

33 

nevso 

13061 

0.0131 

34t<09  7 1 

0.0106 

TCSI  MCHOPT  VIPSUS  tfPO 

34 

B(>'TS 

35 

01TST 

704  t 7 

0.0704 

47493  37 

0.0137 

BIT  TtSTS 

36 

STATS 

7473 

0.0074 

3513-30 

0.1*01 1 

S'.iTUS  TISTS 

37 

LOOPJ 

3S4S9 

0.03SS 

673IS-07 

o.iCl“ 

LOOP  JUMPS 

38 

UNCJP 

74379 

(1.0744 

113147  64 

0.0357 

UNCONOniON.it  JUMPS 

39 

NOOPS 

00 

0.0001 

157.57 

0.0Oi«l* 

NO  OPtPt.T  IONS 

40 

>CT 

SI  60 

O.00S7 

7696  96 

0.0074 

f.fcutt  tmciiwc  bdopcss 

41 

01SCL 

741 

0.0007 

730. 73 

0.0*  *i*7 

msetuHNtous 

Th«  total  SHIFT 


HOST  T1HECDNSUHING  INSTRUCTIONS  EXCLUDING  HOMTOR  CALLS 

Rtlattvc  oxtcgt  ion  t too  11  with  rcipcct  to  th«  overogt  instruction  To  r this  proyon. 


NAME 

USEO 

FRACTION 

cumul 

PEL0T10E 

•TIMES 

fraction 

USEC.  OF 

TOT,  Tint 

FPACTIQN 

EXEC.  TIME 

EXECUTED 

<*  EAECN5. 

1 HOVE 

466047.27 

0.1151 

0.1151 

0,7566 

191789 

0.1918 

Z ADO 

218847  SO 

0 06?9 

0.213*3 

0.8662 

79290 

0 0793 

3 FHPR 

213052,14 

0 06G3 

0??93 

3.121? 

19306 

00191 

4 HPVEH 

186515.94 

0 0601 

0.3371 

0.8033 

72293 

0.0723 

S PUSH 

123860.  S2 

0.0383 

0 3?S? 

1 2G72 

30236 

O.0302 

6 JPST 

103164  60 

0.0321 

0.1078 

0157- 

70180 

0.0702 

? rsc 

85641  96 

e o:g? 

0.1315 

3 301? 

7806 

0.0079 

8 FOUR 

79121  90 

o.ocig 

0 1691 

1.1622 

5533 

0.("‘56 

9 FS8R 

72620  64 

o o??g 

0.1017 

1 . 7560 

128  ?G 

D.0129 

18  FOV 

71906.20 

0.02:1 

0.5011 

1.152? 

5031 

0.0050 

11  POPJ 

66939.00 

o.oroe 

O.S2SO 

0.9901 

2 1 050 

0.0210 

1Z  IHUL 

63892  S3 

0 0199 

0.5119 

3.0513 

6513 

O.O*  *65 

13  FAOR 

61987  38 

0 0193 

0.561? 

1 6999 

11353 

00111 

M POP 

57909.10 

0.01  €1*1 

0.5822 

1.2921 

13951 

0.0110 

IS  PUSHJ 

56884. IS 

0.0177 

0.5999 

0 • 968 

10265 

0.0183 

16  FAD 

54843  68 

0.0P1 

0 . G 1 70 

1.5016 

1079G 

0.0108 

1?  HOVEI 

53030 . ZS 

O.OlGS 

0 6335 

0.15?? 

36076 

0.0361 

18  LD8 

47521  80 

0.0H8 

0.G183 

2 3816 

621? 

0.006? 

19  FHP 

43858.23 

0013? 

0.GG19 

3.2??2 

11-3 

0.001? 

ZO  101VI 

40779  80 

0.01Z7 

0.G71G 

1.919? 

2581 

0.0026 

Z1  CAHLt 

39781  50 

0.0121 

0.GB70 

0 8662 

11166 

0.0116 

ZZ  ASH 

3GS52.00 

00111 

0 6981 

0.717? 

15230 

00152 

23  IOIV 

3643940 

0.0113 

0. ?09 7 

5. 1996 

218? 

0.0O22 

2t  AOJA 

32751. 6J 

0.010? 

0.7199 

0 6573 

18297 

0.0183 

ZS  IHULl 

32660.68 

0010? 

0.7301 

2.5530 

3983 

p.nrnn 

26  AOS 

32119.55 

0 0100 

0 7101 

09196 

10531 

0.0105 

2?  SUB 

31201  50 

0.009? 

0 7198 

0.8562 

11316 

0.0113 

28  CAHGE 

30910.00 

0 0096 

0.7591 

0.8562 

11210 

0.0112 

29  HPPZ 

29906.01 

O.0O93 

0.7608 

0.7566 

12  3“  7 

00123 

38  ILDB 

28519.92 

0.0009 

0.7776 

2.1659 

36**1 

0. 0036 

31  8LT 

26425.20 

O.OUB? 

0.7859 

11.1339 

670 

0.0006 

32  FAOPH 

25604.26 

0 ■ 0080 

0.  ?93B 

2.0019 

398? 

0.0010 

33  HLP2 

22528.53 

0 . 0070 

fl.8000 

0 7566 

9??1 

0.0‘*93 

34  AO0I 

20395. 26 

0 . 0063 

0.6-V? 

0.55 ’3 

11391 

0.0111 

35  CAHL 

18944. ’5 

O.0O59 

0.8131 

0.8562 

6069 

0. 0**69 

36  L5H 

18312.00 

0 . 005? 

0 8180 

0.7172 

7630 

O.0076 

3?  F.iOPB 

17582.46 

0.0051 

0.0212 

2.0019 

^1— » 
■ ci 

0.002? 

38  1DP8 

16958.04 

0.0053 

0.0295 

2.-585 

1911 

0.0“ 19 

39  CmMG 

15903.25 

0.0050 

0 8315 

0.8562 

6783 

0 ““SB 

te  TPNN 

14586.32 

0.0015 

0. 6390 

o .6io: 

7112 

0 0071 

T1  FAOl 

13S55  98 

0.001? 

0.0132 

1 8163 

2206 

0.0023 

42  HOVN 

13303.17 

O.O011 

0 81 71 

0.8126 

5097 

0.0051 

43  CAIN 

12867.94 

0.0010 

0.8611 

6.6573 

7106 

0.0*172 

44  UFA 

17764.42 

0 0010 

0 8561 

1.5536 

2658 

0.0026 

45  CM1E 

12724.25 

0.0010 

C.B693 

0.0562 

162? 

0.0016 

46  HPPH 

17635  90 

0 0039 

0.B633 

0 93?1 

1198 

(*.**“1? 

4?  UNE 

10662.40 

0.0033 

0.0066 

0.6102 

5110 

0 0*  ’51 

48  JUMPS 

10414.72 

0.003? 

0.8690 

0.5573 

5818 

0.0*  *68 

49  FHPR1 

10779.85 

0.003? 

0.8730 

2 -866 

1113 

0. 0**11 

SO  ASHC 

9956.70 

0.OO31 

0.0761 

11976 

2070 

0 0021 

SI  AOJL 

9587.74 

0.0O3O 

0.8791 

0.5S73 

5366 

0. 0**51 

52  JSP 

9078.79 

O . OO?0 

0.0019 

0.068- 

3261 

0.0033 

S3  JRA 

890504 

O.00?B 

0.8017 

0.9776 

-836 

O.0028 

54  TlZ 

8514.24 

0.00?? 

0.0073 

0 6107 

1311 

O.  >»\»13 

S5  POTC 

8477.17 

0 . 00?6 

0.8900 

1.1976 

1752 

0.0018 

S6  SF1PN 

8312. 85 

0.002G 

0 8925 

0.8126 

3186 

0.01*3? 

S2  J5A 

8239.16 

0.002G 

0 8951 

0.912? 

2812 

0.0028 

S8  SF1PCE 

8119.71 

0O0?5 

0.89  ’G 

08126 

3111 

(*.0031 

59  RQT 

887800 

0.0025 

0.9O01 

0.7172 

3315 

U.  (**>33 

60  UNO  I 

7901  88 

n.0i  05 

0.902G 

0.6013 

I’j-’B 

(*.1*019 

61  SU81 

7763.23 

0.0021 

0.9050 

0.5573 

1337 

0.0O13 

62  AND 

7661. 17 

0.0*01 

0.9‘i7' 

0 800? 

2901 

0 0030 

63  CA1E 

7600.34 

0.  l'0?1 

0 • 9*'3'J 

0.5673 

1216 

0.0*11? 

64  XCT 

7S9G  96 

0.0021 

0.912! 

O 1577 

51 6B 

0.0062 

65  JUMPLE 

74 78.67 

O.0023 

0.9H5 

0.5573 

1178 

0.0012 

66  SF1PE 

7057  44 

0.0022 

0.91G? 

0.8126 

2701 

0.0*  *27 

67  JSP 

6995- 73 

0.0O22 

0.9180 

0.1577 

1769 

0.0018 

6e  ops 

69’8  SO 

O.0O22 

0.9210 

2.6161 

821 

0.0*  *4*8 

69  JUHPL 
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0.9031 
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0.070? 
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5356 

0.0054 

0.7793 

3S 

XCT 

5168 

0.0052 

0 7045 

36 

MOVN 

S"97 

00051 

0.7896 

37 

FOV 

5034 

e.ooso 

0.7946 

38 

ONOI 

4908 

O.0O49 

0.799S 

39 

JSP 

4 7S9 

0.0040 

0.8043 

48 

COME 

4677 

0.0046 

0.0009 

41 

TLZ 

4344 

0.0043 

0.8132 

42 

SUBI 

433" 

0.0043 

O.0P6 

43 

COIE 

4246 

0.0042 

0.0210 

44 

HPPM 

4198 

0.0042 

O.026O 

45 

JUMPLE 

4170 

0.0042 

0.03‘.'2 

4G 

FMP 

4173 

O.0O42 

0.0344 

47 

lnuil 

3983 

0.0O4t3 

00104 

48 

FhOPM 

3982 

0.0010 

0.0423 

49 

jumpl 

3827 

0.0030 

0.0462 

SO 

setz 

3773 

00030 

0.0499 

SI 

coic 

3706 

0 0037 

0.8536 

S2 

JUMPN 

3’03 

0.*037 

0 8573 

S3 

1108 

3601 

0.0036 

0.0609 

S4 

POT 

3345 

0.0033 

0.0643 

ss 

M3VS1 

3790 

0.0033 

0 06^6 

ss 

JSP 

32SI 

O.O033 

0 8700 

57 

SUPN 

3105 

0.0032 

0 0740 

SB 

S'  1PCE 

3111 

0.0031 

0 0771 

S3 

HND 

7901 

0.0030 

0 8801 

60 

TLNN 

796S 

o.  <030 

0 0031 

61 

SOJCE 

705n 

0.0020 

0 0059 

62 

JPO 

2036 

0.0020 

0 0800 

63 

coile 

7825 

0.00:0 

0.0916 

64 

jso 

2012 

O 0020 

0.0944 

65 

FhORB 

I'll 

0.0027 

0.09  "’I 

66 

S*  IPE 

2701 

0.0027 

0 0990 

67 

101V1 

Tsai 

0.0026 

0.9074 

68 

u^o 

2550 

0. 0‘C6 

O.90S0 

63 

movm 

:si9 

0.0025 

O 9075 

70 

SET2M 

7440 

0.»»‘21 

0.9099 

71 

oaj 

?4  34 

O.O024 

0.9124 

7" 

JFCL 

2190 

P.0024 

0 9147 

73 

S>  1PC 

mi 

o.O»*23 

091  70 

74 

SOJ 

?:93 

O.0O23 

0.9193 

?S 

FhOI 

7706 

0 0023 

0 9716 

76 

low 

:\b: 

0.0022 

0.9730 

77 

HPLI 

2101 

0 . 0022 

0.9760 

78 

hSHC 

70  ?o 

0.0021 

0.9201 

73 

M0VN1 

"ft!  3 

0.0020 

0.9301 

80 

17PB 

1914 

0-0019 

0.932O 

81 

OTS 

1060 

0.0019 

0 9339 

82 

SOJC 

1041 

O.00  IB 

0.9357 

83 

CHICE 

1812 

0.0018 

0 9375 

84 

S'  IPO 

1764 

0.0010 

0.9393 

BS 

SET  28 

1760 

0.0010 

0.9UO 

BE 

P3TC 

irs: 

O.O010 

0.9420 

87 

E«CH 

1737 

n.oul 7 

0 94  45 

88 

HC 

1530 

0.0015 

0 9461 

09 

CHMN 

1516 

O.ooiS 

0.9476 

90 

CHlfi 

1504 

O.OOis 

0. n401 

91 

HPL2 

1494 

O 001S 

0.95->6 

92 

HLL2 

I486 

0.0015 

0.9521 

33 

JUMPCE 

1431 

0.0014 

0.9635 

94 

TR2E 

1417 

0.0014 

0 9549 

Th#  l.  ji  SMFT 
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95 

HPL21 

1279 

0 *13 

0.966. 

95 

SOS 

1779 

o.o»t|3 

0.95?5 

97 

A08JP 

1277 

0,0013 

0 950? 

98 

rooPi 

12S0 

O.OOl? 

0.9600 

99 

TONE 

1211 

0.001? 

o.or.i; 

100 

s-m 

1777 

0,001? 

(•  ?G?4 

101 

HPRl 

1710 

0.001 2 

0 963? 

107 

jrro 

1208 

o.ooi ? 

0.9649 

103 

0000 

1180 

o.ooi? 

0 9660 

101 

JUMPS 

111? 

0.0011 

0.96": 

105 

roppi 

1113 

O.OOI 1 

0.9683 

106 

M3UW1 

1138 

0 0011 

0.969S 

107 

ISHC 

1111 

0.0011 

0.9  06 

108 

HPPZl 

1081 

0.0011 

0.9?1? 

109 

SOJL 

986 

0."010 

0.9?:? 

110 

nous 

919 

0 . 0009 

0.9736 

111 

A OSCE 

919 

0.0009 

0 .3716 

112 

TOZh 

680 

0.*  0 .9 

0.97S4 

113 

DOB 

871 

0 . 0008 

0 9763 

111 

HPR2M 

?87 

0 . OiVfl 

0.9770 

115 

HPP 

719 

0.000? 

0. 9??8 

116 

HPI. 

717 

0 r>007 

O.9'05 

117 

MIL 

738 

Q.OOO? 

0.903 

118 

coil 

726 

0 000? 

0.90‘h* 

119 

tlu 

720 

O.'X'O? 

0.9007 

120 

008  JN 

709 

O.Wh'.j 

0.9814 

171 

10P 

702 

OO'.'O’ 

0.90:1 

172 

SOSlE 

610 

0.0006 

0 90:? 

173 

HPLM 

581 

0.0006 

U 9833 

121 

OS! 

5 ?6 

0. 0". '6 

0.9039 

175 

Olt 

S70 

0.0006 

0 904b 

176 

Hf’POl 

561 

il.oOl  <6 

0 90SO 

127 

novsn 

55G 

o.  oi  *1.6 

0.90.6 

128 

roon 

516 

0.0006 

0.9061 

129 

S>  1PLE 

S38 

0 . 0.v«s 

0.900? 

130 

F0PP8 

SI  7 

0.0 OOS 

0.90?? 

131 

rsnpi 

162 

O.OO.'S 

0.9076 

132 

SOSN 

1S6 

(*.»'»  M.»S 

0.9081 

133 

007 

119 

O.i««01 

0.91185 

131 

5056 

175 

0.0004 

0.9090 

135 

0><6 

127 

O.f»004 

0.9094 

’36 

nouns 

107 

0.0004 

0.9090 

137 

HPP2S 

367 

0.00.  .4 

0.990? 

138 

HI  PE 

36? 

0.0004 

0.9905 

139 

fdvpi 

371 

0 O‘*03 

0.9909 

110 

SETO 

319 

0 . ooo3 

0.991? 

111 

nouns 

31S 

0.0003 

0.9915 

112 

001 

308 

0.0003 

0.9910 

113 

St  ton 

305 

0.0003 

0.9971 

HI 

TP? 

788 

<1. 0**03 

0.99? 4 

115 

0008 

287 

0.0003 

»*.  99?? 

116 

rsa 

78? 

0.0003 

0.9910 

117 

I PC 

780 

O.OO03 

0.993? 

118 

rnpi 

272 

O.OO03 

0-9935 

119 

SETcn 

270 

0.0.  .03 

0 9938 

ISO 

SOJE 

761 

0-0'» 03 

0 9940 

151 

10P 

211 

0.0 00: 

0.9943 

157 

rsapn 

736 

O.Oi'H*? 

0 9945 

153 

HOSC 

236 

0.000? 

0 9940 

151 

SOJrt 

233 

o.ooo? 

0.9950 

155 

SOJLE 

223 

0.000? 

0- 90S? 

155 

lnuin 

220 

o.uoo: 

0.9954 

157 

Tl?N 

21? 

0 000? 

0.  995? 

1SB 

Tl?E 

186 

0 000? 

*•9960 

159 

nuu 

1?S 

0.000? 

0. 9960 

160 

EQU 

171 

0.000? 

0.996? 

161 

OPCn 

l?3 

oooo: 

0 9964 

167 

HOSCE 

171 

0.000: 

0.9965 

163 

rnppn 

158 

O.OoO? 

0 996  ? 

161 

SOJN 

ISO 

O.ooi*? 

0.9960 

165 

017 

137 

O.tH'rtl 

0.9970 

166 

«0P 

13? 

0.  O.'Ol 

0. 99"*1 

16? 

TONE 

129 

0.0OO1 

0.9973 

168 

AOJE 

125 

0.0OO1 

0.9974 

159 

POSH 

172 

0.0001 

0.9975 

170 

nuc 

117 

0.0.  .01 

0.99  T. 

171 

loon 

117 

0 0001 

0.99?? 

172 

T5C 

97 

0.0001 

0 .9978 

173 

HPL01 

96 

0 . 000 I 

0.9979 

171 

Oil 

89 

0.1*001 

0.9900 

175 

pNOCni 

87 

0..V *01 

0.9981 

16 

r«08 

86 

0 i*Ot»l 

0. 990? 

177 

roun 

86 

0.0001 

0.9983 

178 

HPizn 

76 

0.0001 

0.9903 

179 

coi 

71 

0.0001 

0.9304 

180 

X0P8 

77 

0.0001 

0.9905 

181 

10P8 

66 

0.0' *01 

0.9986 

18’ 

SETZ1 

65 

0.00 01 

0.9986 

183 

1 r’?N 

S9 

0.0001 

0.990" 

181 

010 

56 

0.0001 

0. 9987 

185 

ONOCn 

5S 

0.0 '^1 

0.9908 

186 

oNOCon 

19 

0.0000 

0.9900 

18? 

TICN 

18 

0 . 0.tiVt 

0.9909 

188 

1QP1 

16 

0.0000 

0 9909 

189 

SOSCE 

11 

O.Ortrtft 

0.999* 

190 

SETC0M 

11 

0.0000 

0.9990 

191 

AOJN 

11 

0.0000 

0.9991 

192 

or? 

10 

0.0000 

0.9991 

Reproduced  from 
best  available  copy. 
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193 

tide 

37 

o . oooo 

0.9991 

194 

012 

34 

o.OOiio 

0.9992 

1 99 

HUM 

33 

0 . 0000 

0.9992 

19G 

FO’JPB 

33 

0 0000 

0.9997 

19** 

HLPS 

32 

0. noon 

0 9993 

190 

X0P1 

3‘* 

O.(n>i0 

0. 9993 

199 

hO.JG 

29 

0.0000 

0.9993 

200 

TPO 

20 

o oitfm 

0.5991 

201 

0G2 

2? 

0.0001) 

0.DJD4 

202 

1MU10 

25 

0 . nooo 

0.9994 

203 

HPPE 

23 

0 . nooi) 

0.9994 

201 

TLOA 

22 

0. OOOn 

0.9996 

205 

POJIE 

21 

o.  ouno 

0.9995 

206 

HLP 

21 

O.OilOO 

0.99% 

20? 

OPCO 

20 

O.ooon 

0-99% 

200 

HlL25 

20 

0 - O'  'l»  » 

0.9996 

203 

HLP2M 

20 

0.0000 

0.9996 

210 

H\0Ch 

19 

0 . 0000 

0-9996 

211 

TOOh 

10 

0.0000 

0.9996 

212 

HPL5 

17 

0 . OOOO 

0.9996 

213 

HPPO 

16 

0.0000 

0.9996 

214 

070 

15 

0. OOOO 

0.9996 

215 

odse 

15 

0 . OOOO 

0.999? 

216 

071 

14 

0 . 0000 

0.999? 

217 

HU2M 

14 

0. 0000 

0.999? 

210 

5ET01 

14 

0 . 0000 

0.999? 

219 

JUMP 

14 

0.0000 

0.999? 

220 

HPPE  1 

13 

0.0000 

0.999? 

221 

TLDN 

13 

0 . OOOO 

0.999? 

222 

hNOM 

13 

0.0000 

0 . 9990 

223 

M0U55 

12 

0.001)0 

0.9998 

224 

snoB 

12 

0. 00"0 

0.9990 

225 

5U0M 

11 

0.0000 

0.9998 

226 

HI  PM 

11 

0.0000 

0.9990 

22? 

017 

11 

0.0000 

0.9998 

220 

HLLD 

10 

0 . i)00O 

0.9990 

229 

TDNN 

9 

0.0O00 

0.9990 

230 

TO? 

9 

0.0000 

0.9990 

231 

TL2H 

9 

0 . 0000 

0.9990 

232 

5D5E 

0 

0.0000 

0.9999 

233 

DPCM1 

7 

0 . 0000 

0.9*199 

234 

A05N 

6 

0.0000 

0.9999 

235 

UCE 

6 

0. OOOO 

0.9999 

236 

040 

6 

0.0000 

0.9999 

23? 

HLP25 

6 

0.0000 

0.9999 

230 

HPPOM 

5 

0.0000 

0.9999 

233 

SDSL 

5 

0 . 0000 

0 9999 

240 

066 

5 

0.01 100 

0.9999 

241 

HPPE  5 

5 

0 . 0001) 

0.9999 

242 

5ETCMM 

5 

0.0000 

0.9999 

24  3 

ED’JPM 

5 

0 . 0000 

0.9999 

244 

041 

4 

0.0000 

0.9999 

245 

5D5A 

4 

0.0000 

0.9999 

246 

021 

4 

0.0 OO0 

0.9999 

24  7 

015 

4 

0 . 0000 

0.9999 

240 

050 

4 

0 . 0000 

0.9999 

249 

HPPEM 

4 

0 • 0000 

0.9999 

250 

*OPM 

4 

0.0000 

0.9999 

251 

HPL25 

4 

0 . 0000 

0. 9999 

252 

HPPOS 

3 

o.ooon 

1 . 0000 

253 

OFN 

3 

0 . 0000 

1 . OOOO 

254 

061 

3 

0 . 0000 

1 . 0001) 

255 

&NDCB 

3 

0.0000 

1 . 0000 

256 

064 

3 

0.0000 

1 . none 

25? 

020 

3 

0.0000 

1 . 0000 

250 

0G5 

3 

0 . 0000 

1 . 0000 

259 

SETCA 

3 

0 . 0000 

1 . 0000 

260 

063 

3 

0.0000 

1.0000 

261 

076 

2 

0.0000 

1.0000 

262 

002 

•» 

L. 

0.0000 

1 . 001)0 

263 

035 

L. 

0 . 0000 

i .oooi) 

264 

ANDB 

•y 

0.1)000 

1 .01100 

265 

056 

2 

0.0000 

1 . 0001) 

266 

016 

2 

0 . 0000 

| . 0000 

26? 

003 

2 

0.0000 

i . 01,100 

260 

TPDA 

2 

0 . 0000 

!.0"00 

269 

T5D 

•» 

L. 

o . noon 

1.0O"0 

270 

FDV1 

1 

0.0000 

1 . O"00 

271 

HLPEM 

1 

O.OuOO 

1 . 0000 

272 

AOJGE 

1 

o.nooo 

l . 0000 

273 

5ETCMB 

1 

0.0000 

1 . 0000 

274 

057 

1 

o.oooo 

1 . 01)00 
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INSTRUCTION  SET  UTILISATION 


INFOPHAT ION  THE OPE T I CAL : 


0Y  a EXECUTED  INSTRUCTIONS.  hG IIH4 : 5-1816 

BY  EXECUTION  TIMES.  ACTUAL : S.6789 

THEORETICAL  MHUlhUTIj  8.7215 


POSTER -GONTER'RISEMhN  FUNCTION 


xOPCOOES 

NOPCOOES  1 

•opcodes  f ruction 

V INCP-  1 

IN  > EXECUTED  INSIP. 

USED 

PECOOEO 

1NTEPP. 

1NTEPP. 

pf COOING  fHCmPS 

hPE  2.  1. 

2?1 

0 

i) 

0 . POO0 

0 . fulfil* 

O.Ofmn 

0.01100 

273 

1 

1 

0.0‘V; 

0.01*1 10 

0 . 0000 

(*.  (10  00 

272 

2 

2 

O.i'OOO 

OOiiOO 

0.0*  mO 

0.00*1(1 

271 

3 

3 

0.0*  *00 

0 . nil- 11,1 

(1.0000 

|*.  (11*00 

220 

1 

1 

0 . 0000 

0.11.100 

0.0*1  '0 

n.  rmno 

259 

S 

S 

0.0000 

n.  00ii0 

|*.  |ll*0O 

0.001*0 

259 

6 

7 

0.0000 

0 . 0l.li  111 

0.001.10 

0 . 1 H 111  | 

262 

7 

9 

0 . 0000 

0.0000 

0.0000 

0.00*11 

266 

8 

11 

(1.  ( 1*100 

0.0000 

O.mmo 

0 . non  1 

265 

9 

13 

0.00110 

0.0000 

O.OoOl 

0.0*1  ’1 

261 

10 

IS 

0.01100 

0 0000 

0.001)1 

(*.0001 

263 

11 

17 

0.0000 

0.(1000 

0.001*1 

0. 0**01 

262 

12 

19 

0.0000 

o.noiAi 

0.0001 

0.0007 

261 

13 

21 

0.00*10 

0.0000 

0.00*11 

O.00H2 

260 

11 

23 

0.0000 

0.01100 

0.0001 

0.001*2 

259 

IS 

26 

0.0000 

0.0>i| 

0.0001 

O.O  n**2 

259 

16 

2D 

0.0000 

O.OOl'l 

0.01)01 

ij.  0007 

252 

17 

32 

0.0000 

n. 0001 

0.0*101 

0.0003 

265 

18 

3S 

0.011110 

0.01*01 

0.0001 

0.0003 

j5 

19 

30 

0.0000 

0.0*101 

0.0002 

(1.O1103 

251 

20 

11 

0 . 0000 

0.0*101 

O.0*il*2 

0.01*03 

253 

Z1 

11 

0.0000 

0.01*01 

0.00i*? 

0.01*01 

252 

zz 

17 

0,0000 

0.0001 

0.0002 

0.0001 

251 

23 

SO 

0.0000 

0.0001 

0 . 0*102 

O.00O1 

250 

21 

51 

0.0001 

0.0001 

0.0ini2 

O.0HO1 

219 

zs 

50 

0.0001 

0.0001 

0.O00Z 

O . O*  *06 

219 

Z6 

62 

0.0001 

0.0(10! 

0.0002 

0.  C<  i05 

217 

27 

66 

0.0. 101 

0.(11101 

0.0003 

0.0*  H *5 

216 

70 

70 

0.0001 

0.0001 

0.0*103 

0.00O6 

215 

29 

71 

0.0001 

0.0001 

O.0M03 

H . 001 16 

211 

30 

"0 

0 • 000 1 

0.0002 

*1.0*103 

0.0006 

213 

31 

0; 

0. 0001 

0.OOO2 

O.(»0il3 

O.00O 7 

212 

32 

O: 

0.0001 

(».0»io2 

O.00O3 

O.OO07 

211 

33 

92 

0.0001 

ti.OOOZ 

0.0001 

0.0007 

210 

31 

97 

0.0001 

0.0002 

0.0001 

0.0' *08 

239 

35 

102 

O.  mml 

O.OO.  *2 

0.0001 

O.Oni'O 

23B 

36 

l.V 

0.0001 

n.  of  Mir 

0.01*01 

*1.0*109 

232 

37 

112 

O.O001 

0.0002 

0.0001 

0.0009 

236 

30 

110 

n.onoi 

0.0(102 

0.1*006 

O.i«(i09 

235 

39 

121 

0.0001 

0.01*02 

O.00O5 

0.0010 

231 

10 

130 

0 . 000 1 

0.001*3 

O.00O5 

0.0O1O 

233 

11 

136 

O.0OOI 

O. 0003 

(*.  *1005 

0.0011 

232 

12 

113 

0.0001 

0.00O3 

0. 0*ii*6 

0.0011 

231 

13 

151 

0 . 0002 

0.0003 

0.00*16 

0.0*117 

230 

11 

160 

O.OO02 

0. 00*13 

0.0006 

0.0013 

229 

15 

169 

0 . 0(102 

0.OO03 

0. 0007 

0.0011 

228 

16 

170 

0 . none 

n.0001 

0.0007 

0.0011 

222 

17 

100 

0 . non; 

n.  11001 

n.oiTiia 

0.00 is 

226 

10 

199 

0 . 0007 

0.0001 

0.  ivrno 

O.oniG 

225 

19 

210 

0.0002 

0 . 0001 

n.00 no 

0.0017 

221 

so 

221 

0 . 0002 

0 . 1*001 

0.01*09 

O.oniB 

273 

SI 

233 

n.(inn2 

0.0005 

0.0003 

0.0019 

222 

52 

215 

0.0002 

(i.OOiiS 

0.1*010 

0.0020 

221 

53 

250 

O.0O03 

C*.  0**05 

0.001O 

0.OO71 

270 

51 

271 

0.000.3 

0.0005 

0.0011 

0.0027 

719 

S5 

701 

0.0003 

0.0006 

0.0011 

0.0023 

219 

SB 

290 

O 0003 

0 . 1 KiiJ6 

0.0012 

0.0021 

21? 

S? 

312 

0.0003 

O.0H06 

0.0012 

0.0025 

216 

50 

376 

0. 0003 

0. coo? 

0.01)1.3 

0.OUZ6 

215 

59 

310 

0 . 0003 

0.0007 

O . Oil  1 1 
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1 0-101 

3.616? 

32 

747 

73  MHO 

0.7315 

0.4610 

4«.9769 

1 .0619 

3 7*43(1 

31 

24  3 

717071 

0 73  *0 

0.4  74-1 

*4.9401 

1 . G9G7 

3.  .".!'*  3 

30 

244 

247004 

0.7470 

0. 4066 

i*  9/(7 

1 9474 

3(10411 

29 

745 

740672 

0.7486 

0-  «9V 

•*.  9946 

1 . PO‘10 

3 9.79 

20 

246 

754B34 

0.7540 

0.5-197 

1 0193 

7 . ft  111  ? 

4.07/3 

74  7 

76134  7 

0.7611 

0.677/ 

1.*'I64 

r."3'-n 

4.10(6 

26 

740 

2607 16 

0.2607 

0 5366 

1 .*'729 

1169 

4. '910 

25 

749 

276477 

0.7754 

0.65-40 

1.101? 

7.7031 

1 .4**6? 

24 

750 

787064 

•V7029 

0 5C-57 

1.1315 

2.. '*(.79 

4.6760 

23 

261 

29" >9 > 

m.79‘5 

0.5010 

1. 162*' 

2.3739 

4 . 61 79 

22 

257 

790 300 

0.2904 

ft.  6960 

i • 1936 

7-3070 

4 . ? <‘4  ( 

21 

253 

30  7b'jl 

0.  3i*  76 

ft. 61 53 

1.2306 

2 4617 

4 9/74 

20 

754 

31010'' 

0.3(02 

O.G164 

1.2/2? 

7.6464 

6.1*9*49 

19 

765 

37119*0 

0.  3790 

0.6600 

1.3169 

2.6318 

5.76.16 

IB 

266 

340710 

0.340? 

ft.  60*14 

1 16»'9 

:r:\? 

5.4416 

17 

257 

151564 

0.3516 

ft. 7031 

1 . 4067 

7- 0175 

6.6.  6*« 

16 

250 

367917 

0 3629 

0.  ,*760 

(.4617 

2 94*33 

6.(1*  '60 

15 

269 

374311 

0.3743 

0. 7406 

1 4972 

2.9916 

5. 9009 

14 

760 

3B0610 

0.31166 

0.7737 

1.64G6 

3.0979 

6.(059 

13 

261 

399494 

0.3995 

0.  7‘l9ft 

( .6900 

3. 1959 

6 39(9 

12 

267 

413440 

0.4114 

0.8769 

1.66.30 

3.3*476 

6.6161 

It 

263 

477914 

0.4779 

0 056(1 

1.7116 

3 47J.1 

6 (HOG 

10 

764 

44  31*14 

0. 4431 

0-0063 

1 . .*726 

3.5151 

? *4001 

9 

265 

461409 

0-4614 

0.9770 

1 . 0466 

3.6913 

/.  7*1176 

B 

266 

4 ’9706 

*4  479? 

ft  9694 

1.91811 

3 0376 

?.(»?6'i 

7 

767 

499092 

0.4991 

ft. 9907 

1 . 99G4 

3.9977 

7.9064 

6 

760 

570147 

0.5701 

1.0409 

7- "13**6 

4.1611 

(1.  .1  *."? 

5 

269 

550370 

0.5501 

1 . 1**08 

7.7015 

4.4030 

8.0416*4 

4 

270 

506163 

0 . 51165 

1- 1729 

2.3460 

4.6916 

9. 3037 

3 

271 

666633 

0 6566 

1 3133 

7.6765 

5.75343 

(ft.5>'6( 

2 

272 

770976 

0.7709 

1 4570 

2.9(5? 

5.B31-1 

(1.6078 

1 

273 

000716 

ft  0002 

1.6161 

3.2320 

6.4G5? 

12.9114 
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APPENDI X £ 

Listing  of  the  short  subject  algorithms 


ALGOL  PP0GPAI1S 


BAIPSTOW 


BEGIN 

COtWENI  THIS  IS  AL  GOP I T HH  30  FPDH  THE  EACH  hLGOPITHFIS  SECTION 

type-in  hno  cmuing  ppogphm  by  a.  lunde: 


APPHY  COCFFSIA: 12 1 .PPEhL  ( 1 i 1C  I .RIFIAGI  1 1 121 .CONCONI 1 1 121 1 
INTEGEP  lOEG.ITEP.NOIGS.lX.ISET: 


PPOCEOUPE  PUTOtJT 1 1 0 Y • 

UALUC  IOi  INTEGEP  lOi 
BEGIN 

WPITECI2C10ATA  SET  *l!  PPIN1 1 10.3.01! 

FOP  IX  * I STEP  I UNTIL  IOEG  00 
BEGIN 

WPITElMCI'li 

PP|N1IPPEaIII>1.6.7i:  PRINTiPIMAGIIXI.6.7): 

PPINT ICONCONI I X l . 13.21 1 
END!  ' OUTPUT  LOOP! 

PETUPNi 

ENOi  1 PPOCEOUPE  PUTOUTi 


PPOCEOUPE  POOTPQL  i NDEG . XTCOF  .L 1 TEP .NF IGS.PPE  .P1F1.C0NU I i 

UALUE  NOEG.LITEP.Nr IGSi 

INTEGtP  LlTEP.NFIGS.NDEGi 

APPhY  XTCOF.PPE .Pin. CONDi 

BEGIN 

INTEGEP  I .J .Hi 

APPHY  COF.B.C.O.EI-OiNOEGli 

PEAL  TST.  ACCUP. PS.  QS.PT.OT.  SCL  .P.  PEU. P.Q: 


PPOCEOUPE  PEUEPSEi 
BEGIN 

TST  . -TST i 

H * ENT  IEPI  ' FSOEG  1 1 1 1 

FOP  J - 0 SKP  I UNI IL  M 00 
BEGIN 

SCL  • LOFIJll  C0FIJ1  * COFINOEG-Jl: 
C0FINDEG-J1  * SCL: 

ENOl  1 SHOPPING  LOOP! 

ENOi  ' PEUEPSEi 


INTEGEP  PPOCEOUPE  L1NEAP; 

BEGIN 

IF  TST  • 0.0  THEN  P - I .0/P: 

PPEIN0EG1  * Pi  PIMNDEGI  * 0.0! 
CONUINOEGl  - ACCUP: 

NOEG  • NDFG-1: 

FOP  J * o STEP  I UNTIL  NDEG  00 

if  ons'cnri ji/ouii  < occur  then  cofiji 

ELSE  COFIJI  - 0.0: 

UNCAP  - NOEG: 

ENOi  1 PPOCEOUPE  LINEOPi 


01  J) 


Bl-ll  - BI-21  • Cl-Il  * Cl  -2 1 - 01-11  * El-11  * 
CDF  I -1 1 - O.Oi 

FOP  J - 0 STEP  1 UNTIL  NOEG  00  COFIJI  - XTCOFUIl 
TST  - I .Oi  ACCUP  - 100INF1GS: 


COnttENI  WHILE  COFINOEGI  * 0.0  00 1 
2P0TESI: 

IF  COFINOEGI  « O.O  THEN 
BEGIN 

PPEIN0EG1  - 0.0:  PININOEGI  * O.Oi  CONUINOEGl  ► ACCURi 

NOEG  - NOEG  I l 
GO  TO  2P0TEST I 
ENOi 


C0OT1CNT  UNTIL  NOEG  = 0 00 1 
BEGIN 

Ini  t i 

IF  NOEG  = O THEN  GO  TO  PETUPN: 

PS  - O.Oi  OS  * 0.0:  PT  - O.Oi  0T 

SCL  - 0.0: 

PEU  * 1.0 i ACCUP  * 10.0  I NFIGS: 


0.0: 


IF  NOCG  * 1 THEN 
BEGIN 

P - -COFI  I l/COFIOl  l 
LINE HP: 

GO  TO  PETUPN: 

ENOi 


FOP  J 
BEGIN 


0 STEP  I UNTIL  NOEG  00 


IF  COFIJI  » 0.0 

THEN  SCL  * LN' A9SIC0F l J 1 1 l*SCL l 

ENOi 

SCL  ► EXPISCL/INOEGMII: 


FOP  J * 0 STEP  1 UNTIL  NOEG  00  COFIJI  * COFIJI/SCLi 
IF  ABSiCOFI  1 1/C0FIO1  I < AOSi  COFI  NOEG- 1 1 /COFINOEGI ) 
THEN  PEUEPSEi 


ClinnENT  WHILE  TPUC  00  ! FINO  L IN  OP  QUAD  FAClORi 
BEGIN 
PEUSEO: 

IF  OS  • 0.0  THEN 
BEGIN 

P - PS:  0 - OS: 

END  ELSE 
BEGIN 

IF  C0FIN0EG-2I  = 0.8  THEN 
OEGIN  0 -1.0:  P - -2.0  END 
ELSE 
BEGIN 

0 * COF (NOEGI/COF I NOEG- 2 1 1 
P - (COF INDEG-I )-0»COFINOEG-3l)/COF(NOEG-2) 
EM)i 

IF  NOEG  = 2 THEN  GO  TO  OAOPTICi 
P - 0.0: 

ENO: 


COtlMENT  WHILE  TPUE  00  ' LOOP  FOP  LINEAR  FACTORi 
BEGIN 
ITERATE: 

FOP  1 * I STEP  I UNTIL  LITEP  00 
BEGIN 
BAIPSTOW: 

BEGIN 

FOP  J - 0 STEP  1 UNTIL  NOEG  00 
BrGlN 

BIJ)  - C0F|JI-P*BIJ-I)-0'BIJ-2Ii 
CIJI  - Bl JI-P*CI J- I l-Q*CI J-2I i 
END: 

IF  cofindeg-ii  » o.o  then 

BEGIN 

IF  BINOEG-  II  « 0.0  THEN 
BEGIN 

IF  HOST  CDF  I NOEG- 1 1 /BINOEG-  1 1 1 < OCCUR 
THEN  GO  TO  NEWTON: 

BINDEGI  - C0FIN0EGI-0»BIN0EG-2)I 
END: 

ENO: 


BNTESTi 


IF  BINOEG)  » 0.0  THEN  GO  TO  QHOPTICl 
IF  ABSi COFINOEGI /Bl  NOEG) I > ACCUP 
THEN  GO  TO  QllDPIlCl 

END: 


NEWTON: 


FOP  J - 0 STEP  1 UNTIL  NDEG  DO 
BEGIN 

01  Jl  - COF IJI*P*OI J-l )l 
El J)  - 01 J l*P»E( J-l 1 i 
END: 


IF  01  NOEG  I = 0.0  THEN  GO  TO  UN: 

IT  ACCUP  I AOS  I COF I NDEG 1/0 1 NDEG 1 1 THEN 
BEGIN 


LIN: 


if  l inf; ap  . o then  go  to  petupn 

ELSE  GO  TO  ITEPATE 
ENO: 


C I NOE.  C - 1 1 * -P»CIN0EG-2I  Cl*CIN0EG-3|i 

SCL  * C I NDEG- 2 1 *C I NDEG- 21 -Cl NDEG- 1 1 »C I NOEG- 3 1 1 

IF  SCL  = 0.0  THEN 

BEGIN  P * P-2.01  Q - Q« 10*1 .01 1 ENO 
ELSE 

P • P*iniNOFFi- 1 1 *C  I NOEG-2 1-01  NOE G 1 *C I NDEG-  3 1 1 /SCL  i 
0 * fj*(  • BlNOEG-l  I *C  I NOEG- 1 )*BIN0EG1*CIN0EG-21 1/5CL 
ENO. 


IF  EINOLG-I I = 0.0  THEN  P - P-l 
ELSE  P - P 01  NDEG I /El NOEG- 1 1 1 
ENO  ITEPATE  LOOP: 

ENO  LINEmP  FACTOR  LOOP: 


PS  . PI:  OS  - OTi  PT  - Pi  OT  * 01 

IF  PEU  r C i.O  THEN  ACCUP  * PCCUP/IO.0I 
PEU  • -PEL'! 

PEUEP5E: 

GO  TO  PEUSEO: 

ENO  F0CT0P  FOUND i 
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Listing  of  lh«  short  subjtct  algorithms 


OhOPTICi 

IF  1ST  < 9.0  THEN 
BEGIN  P * P/<Ji  <3  ► I/O;  ENOi 
IF  ig-(P/Z.«lslP/?.OM  ' 9.0  I HEN 
BECIN 

PPE1N0EG1  •■  PPE IN0CG-1 1 • -P/:.0i 
SCI  • S0PTIQ-IP/2.OIMP/2.0I'; 

PIHIN0EG1  * SCI i 
Pint  NOEG- 1 1 * -sen 
END  ELSE 
BEGIN 

SCL  ••  SOPH  IP/:  OluP/2. 01-01; 

IF  P ( 9.0  I HEN  PPEIN0EG1  - -P/20«SCL 
ELSE  PPEINDEGI  - -P/2.0-SCLi 
PPE t NOEG- 1 I • 0 'PPEINDEGI l 
PirtlNOEGl  - Pill  1 NOEG-  1 1 O.Ol 
END  i 

CONDI  NOEG  1 hCCUPi  CONDI  NOEG- II  - HCCUPi 
NOEG  NOEG-?  i 

FGP  J - 0 SIEP  I UNI IL  NOEG  OC 
BEGIN 

IF  BIJI  » 0.0  THEN  COFIJI  - 0.0 
ELSE  IF  HBSlCOFi  JI/BIJ1)  < PC  CUP  THEN  COfIJl  - 01 J I 
ELSE  COFIJI  * O.Oi 
ENO  * 

GO  10  INI T > 

ENO i ! UNTIL  NDEG  = 0 00  lOOPi 
BETUPNi 

ENO  I I PPOCEOtJPE  POOI POL  i 


I SET  - li 

10EG  » li  IIEP  * 10;  NOIGS  • 7i 
COEFFSIOI  - 10090000.(11  COEFfSlII  * -909130.01 
COEFFSIZI  • -103900.01  C0EFESI3I  - 100000. Oi 

cor.FFsm  - i .oi 

POOIPOi  (IOEG. COEFFS. ITEP. NOIGS. RPEHL.Pin.lG.CONCONI  1 
PUTOUT I ISET 1 1 


1SEI  * Z I 

1DEG  * 11  HEP  * (Oi  NOIGS  • 7l 
COEFFSIOI  * I.Ol  COrFFSIll  *•  -3.0l 
COEFFSI 2 1 - 20.01  COEFFSI 3 1 14.01 

COEFfSIll  - St.Oi 

POOIPOUIOEG.COEFFS.  HER. NOIGS. PPEhI  .RIMhG-CONCONi  i 
PUTOUl ( ISET I i 


ISET  . 3i 

10CG  - Gl  IIEP  » Mill  NOIGS  - ?l 
COEFFSIOI  - I.Ol  COCFFSl 1 I ♦ -2  Ol 
COEFFSI 2 1 - Z.Cu  COEfFSUI  * I.Ol 
COEFFSI 4 I - G.Oi  COEFf  SIS  I * -6  0i 
C0EFFSIG1  - 8.0, 

POOI  POL  I IOCG.COEFFS.  llEP.NOIGS.RPl.HL  .PIUhG.CONCON)  ; 
PUTOUT I ISET ) i 


lsr.r  - ii 

(DEC  • Si  ITEP  - 11)1  NOIGS  - 71 

coeffsioi  . i.oi  coeffsui  • i.O; 

COEFFSIOI  ••  -B.Oi  CUEFrSUI  • 16. Ol 

COEFFSUI  - 7.0i  COEfFSISI  • IS  0: 

POOIPOLUOEG.COEFFS.  IIEP  .NOIGS. PPE PL  . PIUhG.CONCON  1 1 
PUTOUT ( ISE1 ) i 


ISET  - Si 

10EG  - li  11EP  * 19:  NOIGS  - ?i 

COEFFSIOI  ♦ I.Ol  COEFFSUI  * S.Oi 
COEFFS  121  - 3.i)i  COE F F SI 31  ••  -S-Oi 
COEFFSUI  - -9.9i 

POOTPOU IOEG. COEFFS.  ITEP. NOIGS. PPEPL.RIMpG.CONCDNi  i 
PUTOUl  USED  i 


ISF.T  - 6 1 

IOEG  - 3 i IIEP  - (0;  NOIGS  - 7; 

COEFFSIOI  - I.Ol  COEFFSI 1 i - 8 9 
COEFFS  121  - I7.0i  COEFFSUI  - - 10 . Oi 

ROOTPOLF  IOEG. COEFFS. ITEP. NOIGS. PPEPL  .PlNiiG.CONCON)  i 
PU10UTF ISET 1 1 

ENO 


wait 


E-2 


CPUUI 


CDIWNI  liUS  IS  CPI  GO  PLGOPITHH  13.  CPOUT  LINEHR  EOUNS. 

PL  GUP tlMH  BY  FIENPV  C.  IHhCMCR  JR.  • 

NEW  INNEPPP00UC1  ROUTINE  PNO  OTHTP  OPESSINGS  BY  P.  LUNOE 

c-nu  I9.".i 

HPPiif  l DUiiT  1 1 1 IS  ■ 1 1 IS  I . PI  GUT  1 1 1 ISI  .SOL  I It  ISI  i 
INIEGFP  liPPPY  L01PGI 1 1 ISI  i 
PI  PI  D T PUN  l 

FDPWhPO  IhBEL  SINGULPRi 
1NILGE  P I.Ji 


PEnl  PPOCEOUPE  INPPRIIPL .PP.LlN.LOW.npxI i 
UitUE  LIN.lOH.H9X; 

IMEGEP  LIN.LOW.HiiXi 
iiPPhY  PL  .PP: 

BEGIN 

LONG  PCpL  SUHi 
INTEGER  XX  i 

sun  - o.Pi 

FOP  LX  - LOW  SUP  I UNTIL  HPX  DO 
sun  - sun*PLit.iN,i  xi*hpii.xi i 
INPI’RI  • sun  l 
ENO; 

PEPl  PPDEFOUPE  INPPP7lnPPY.LIN.IOL.LOW.n9XH 
Util  UF  L IN. LOG  .L0U.H9X  i 
INTI GEP  L IN.FOL  .IOW.HmXi 
mPPiW  HPPYi 
BEGIN 

LONG  PF9I  Sim l 
I N Ft ITI P IXl 

sun  - 9.i:u 

FOP  M • LDW  STEP  I UNTIL  Him  DO 

sun  - Slin>9PPYILIN.FXI»PPPYIKX.I:0LH 
INPPP7  • sin  i 
END; 


PPilCrOUP!  CPDUT.-ihPP.PHS.NBYN.PES.  IV01P. DEI  .PEPE9T  1 1 

Vi  HUE  NBYN.F'FPlHTi 
1,1'P.iY  nPP.PHS.PES i 
INTI  ITEP  NBYMi 
IMIEGFP  iiPPiTY  IVOTPi 

pi  ni  on ; 
mini  fhn  pipepti 
OEGIN 

INUGFR  IX.JX.I-X.IHPX.IPl 
PLm(  ump.ouot  i 

DU  • I.Ol 

IF  PIPEhI  thin  GO  to  LPBLBi 
f i)P  I '<  . I SIEP  I UNI  II.  NlIrN  OU 
BEGIN 

irnp  . 0.9 1 

Flip  l<  * I X STEP  I UNI  IL  NlIrN  DO 
III  G IN  , . 

,.1'PI  |«,l  X|  . PPP||X,»  X|-1NPPP2IPPP.IX.IX,I,I,X-III 
li  iGSihPRIIX.KXII  > TEHP  I HEN 
01  GIN 

1 1 HP  - 905'OPRI IX.FXI 1 1 
IHhX  • IX; 

INI)  i 
I NO; 

ivniPiFxi  - mix; 

IF  irinX  » I « THEN 
Bf  GIN 

or  i . - on  i 

Fill’  j;  - I STEP  1 UNTIL  NHlN  oo 
OEG'N 

II  ,1P  • HPPII  X . JX  1 1 
hP'Tfx.jxi  - fiPPllnux.jxli 
Hi  linnr.JXI  . TEHP; 

END; 

TIM*  • PUSH  >1; 

push  xi  • piisuni'ixii 

PH5iinHxi  . imp 

lndi 


ir  mppiu.lxi  - 9.o  thin  r,o  to  singuutpi 


gum  • 1.9/hPPII  X.MIi 
FOP  IX  - I AM  STEP  I UNI  IL  NB(N  00 
HPPIIX.IXI  - OUOMnPPIlx.tXli 
fDP  J»  * LX* I STEP  I UNTIL  NOYN  DO 

fiPPILX.JX  I - 9PPILX.JXI  - INPPP2I9PP.XX.JX.  I.  XX-lll 
PHSIIXI  . PMSIXX I - lNPPPH9PR.PHS.fX.  I.  XX-Ill 
ENO; 

GO  TO  LBL7i 


-i  i i iia'a^MirM  aifi 


List  ing  of  ih«  short  subject  sl9onthM 


LABL6i  CDnrtbT  NEW  P1GHT  SIOC  ONLY.  1 
F0«  KX  * I STEP  I UNTIL  N8YN  DO 
BECIN 

TEMP  * PHSI IVOTPIFXI 1 1 
PHSl 1UOTPIKXI I * PHSIKXH 
PHSIKX1  « TEMPI 

RHSIKX1  > PUS  IF  XI  - 1NRPPIIARP,RHS.I>.X.I.KX-Ill 
END  1 


FOP  KX  ► NBYN  STEP  -I  UNTIL  I 00 
BEGIN 

IF  NOT  REPEAT  THEN  OET  » ARRUX.KX)*DETi 
RESIKXI  . (PHSIFXI 

- INPPRI (ARR.PES.KX.KX*!  .NBYN)  I/APPIFX.KXI  i 

END  1 

ENOi  ! THAT  HAS  CPOUT  2.1 

FOR  I - I STEP  I UNTIL  IS  DO 
BEGIN 

FOP  J - I STEP  I UNTIL  IS  DO 
E0UATI1.JI  - O*J)/Z.0i 
PIGHTII I - LNU/3.0U 
EQUAT 1 1 . 1 1 * EOUAT ( I > 1 1 * IS- 1 1 
END  1 

CROUTZIEQUAT.RIGHT.IS.SOL.IOlAG.OTRMN.FHLSEIi 

GD  TD  EXIT; 

HPITECICI'll 

PPINTI0TPMN.I0.61l 

HPITECICI'll 

FOP  1 I STEP  I UNTIL  IS  00 
BEGIN 

WRITE!' Id'll 

FOP  J * I STEP  I UNTIL  IS  00 
PRINT (EQUAT 1 1 . J 1 . 10.61 1 
END  1 

URITECICI'H 

FOR  I - I STEP  I UNTIL  IS  DO 
PRINTILDIAGIII.IO.eii 
WRITE* " 1C  I” 1 1 

FOR  I * I STEP  I UNTIL  IS  00 
PRINT! RIGHT  1 1 1 . 10.61 ■ 

HPITECICI'll 

FOP  I * I STEP  I UNTIL  IS  00 
PRINT ISXI I 1 . 10.61 1 
GO  TO  EXITi 

SINGULAR  1 

HRITECICISINGULAPICI'li 

EXITi 

ENOi  I ENO  Of  MAIN  PPOGPAM.  1 


BEGIN 

COMMENT  ALGORITHM  1 13  FPOM  THE  COLLECTED  ALGORITHMS  COLOUMN 
OF  THE  CACM.  ALGORITHM  AUTHOR  IS  ROBEPT  H FLOYO. 

MAIN  PPOGPAM  U.  CALLING  SEQUENCE  SUPPLIED  BY  A.  LUNDEi 

ARRAY  BEFOPE 1 1 1 401 1 . AF 1EPI 1 1 4001 1 
INTEGER  INFINITY. Fi 

PROCEDURE  TREESOPTIUNSOPTED.'i.SORTEO.KI  1 
VALUE  N.Ki 
INTEGER  N.l  i 
ARPAY  UNSORIED.SOPIEOi 
DEGIN 

INTEGER  I.Ji 

INTEGER  ARRAY  MIIiZ*N-lll 

FOP  I * I STEP  I UNTIL  N 00  HIN'I-II  ► I ■ 1 0OEJ0*N*  I - 1 1 
FDP  I * N-I  STEP  -1  UNTIL  I 00 
Mill  » IF  UNSDPTEDIMIZM)  OIV  10000) 

< UNSDPTEDIMIZMMI  OIV  10000)  THEN  MIZ>II 
ELSE  ME 1*11 1 

FOR  J . I STEP  I UNTIL  K DO 
BEGIN 

5DPTE0IJ)  « UNSOPTEOIMII)  OIV  IDOOOIi 
i - Min-iMiii  oiv  10000) *)ooo0 1 
Mill  * INFINITY  • 10000: 

FOR  I * I OIV  Z WHILE  I > 0 DO 
Mill  * IF  UNSDRTEDIMIZ*! I OIV  IDODOI 

< UNSOPTEOI  M12*I  * 1 1 OIV  ID0O0I  THEN  Mt2*>I  I 
ELSE  MIZMMIi 


ENO  TREESDPT 1 
INFINITY  • 40II 

FDP  I * I STEP  1 UNTIL  400  OD  BEFOREIKl  * 401.0-Li 
BEFORE  1 40I I * IOOOD.Oi 

TPEESDRT IBEFDRE . 400. AFTER. 400 1 1 

FOR  L * I STEP  I UNTIL  333  DO 
IF  hFTERII  I > AFTERIi:*!!  THEN 
BEGIN 

HPITECICI'll 
PPINTIK .6-0) 1 

WRITE!'  OUT  OF  OROEPICI'li 
END: 

END  MAIN  PPOGPAM 1 


E-1 


L i*t m3  of  th#  *hori  »ubj«ct  •l9orith«» 


PERT 


BEGIN 

INTEGER  NEVNTS.IXI 
INTEGEP  APPAY  IN  IT  .LAS?  .LINT- 11 :3001 i 
APPAY  ESlinE.EABLYS.lAlEFlliJOOIi 
PEW.  TSTARTi 

PROCEDURE  PERTINMAX.  1BEG.JEN0.TE.ST .EHAX.LN)  .ES.AT  1 1 
INTEGER  NUAX.EHAXi 
PEW-  ST: 

INTEGER  APPAY  IBEG.JENO.LNkl 
REAL  APPAY  TE .ES.AT I 
VALUE  NHAX  -ST  i 
BEGIN 


INTEGEP  I i . „„ 

INTEGER  NX.IEX.ISX.ITX.NXi 
REAL  AXX.XXXI 
SWITCH  SWZ  . Gl.GZi 


PPDCEOUPE  SCANITOBJIi 
INTEGER  TOBJi 
BEGIN 

INTEGEP  KXs 

IF  IEX  » I THEN 

BEFOP  NX  . IEX -I  STEP  -1  UNTIL  I 00 
IF  TOOJ  ■ LNK1IXI  THEN 
BEGIN  TDBJ  » KX i GO  TO  RETURN:  ENO 

LNK1IEXI  » TDBJ:  TOBJ  * IEX:  IEX  * 1EX»1 : 

RETUPN. 

ENO  SCAN: 


FDR  NX*'  I STEP  I UNTIL  NflAX  DO 

BEGIN  SCAN! JEND1NX1 ) : SCAN: IBEGINXI I : 


ENO: 


ISX  . I:  AXX  . ST: 


EtIAX  . IEX-Il 
WHILETRUEDDi 

FOp'lEX^'l  STEP  1 UNTIL  ETIAX  00  AT  I IEX  I . AXX: 

SZFDP  NX  • 1 STEP  1 UNTIL  NHAX  OD 
BEGIN 

IF  LNK1 1BEGINX I ] > 0 THEN 
BEGIN 

SWITCH  SWI  * OI.BZl 
GD  TD  SWIIISXII 


Bit 


82: 


XXX  . ABSIATI IBEGINXIll  ♦ TE1NXI: 

IF  XXX  X ABS( AT  1 JEND1NX ] 1 ) THEN  AT1JEN0INXI1 
GD  TD  ESACI: 


-XXXI 


XXX  . ABSIATI IBEGINXI ) 1-TEINX 1 : 

IF  XXX  < ABSIATIJEN0INX1I)  THEN  AT1JEN0INX1] 

ESACI: 

END: 

FDr'iEX  * 1 STEP  1 UNTIL  EIWX  00 
BEGIN 

IF  LNKUEXK  0 THEN 
BEGIN 

IF  ATI IEX 1 < 0 THEN 
BEGIN 

LNHIEX1  - ABSILNT.lirxl): 

AT  I IEX  1 - ABSIATI IEX II : 

END: 

END  ELSE 

IF  AT  HEX  I X-  0 THEN 
BEGIN  LNU1EX1  - -LNItlEXI: 

ELSE  ATI IEX  1 - ABS: AT  1 IEX 1 1 : 

END: 


-XXX: 


KX  - KX*I : 


KX  » KX-li  ENO 


JEN0INX1  • ITX: 


PPOLEOUPE  PUTOUTINEV.LX.EAS  XLF): 
VALUE  NEVi 
INTEGER  NEV: 


INTEGER  APPAY  LX: 

APPAY  EAS.Xl.F: 

BEGIN 

WPITEIMCI’II  PPINTINEV.Y  .0) : WRITE!*  EVENTS1CI*': 

GD  TD  PFTURN: 

FDP  IX  • 1 STEP  1 UNTIL  NEV  DO 

BE|jPIT£I*ICI*):  PPINTILXIIXI. 1.0»i 

PPINllEASI 1X1. 10.1)1  PR  INI 1XLF I IX 1. 10.1) i 
IF  mOSi  IEhSI  1XI-XLF1 IX 1 1 ) < O.0O1 
THEN  WPITEI * CPITICAL*): 

ENO: 

WRIIEI’ICI*): 

RLTUPN: 

ENO: 


PPDCEDUPE  WORK(NATTS): 

VALUE  NhCTSi  INTEGER  NACTSi 
BEGIN 


IF  KX  ■ 0 THEN  GOTO  SZ: 

GDTO  SUZ1ISXI: 

G1 : 

ISX  * 2 t 

FDP  NX  • I STEP  I UNTIL  NHAX  00 

KITX  . IBEGINXI  I IBEGINXI  » JEN0INX1: 

END: 

AXX  . 0: 

FDP  IEX  . I STEP  I UNTIL  E«AX  DO 

B£ES1IEXI  » ATI  IE  XI : LNMIEXI  - ABSILNK I IEX  I ) i 

IF  ATI  IE  X 1 > AXX  THEN  AXX  * ATIlEXll 
ENO: 

GO  TO  WHILETRUEOO: 

'“'FOP  IEX  * I STEP  I UNTIL  EHAX  DQ  LW.11EXI  - ABSILNM 1EXD : 
ENO  PERT: 


IN1TIII 

INITIZI 

INI  I (31 

1NITI1I 

INITI51 

IN1TIB1 

INITIZI 

1NITIBI  • 

INITIO!  * 

lNinim  ■ 

INI T 1 E 1 1 ' 

INITIIZI  ' 

INI T 1 1 3 1 

INIT1HI 

IN1T1151 

1NITI1BI 

INITIIZI 

1N1TI1B1 

1NIT1 101 

TNI  Tl.’OI 

1N1T1ZI  1 

IN1TIZZI 

TNITIZ3I 

INI T I El  I 

INITIES1 

IN1TIEBI 

1N1T1ZZI 

INITIZBI 

INITIZ91 

INI  U 30 1 

INI T 1 31 1 

IN1TT3ZI 


I: 

II 

1: 

I: 

Zi 

Z: 

Z: 

3: 

3: 

- 1: 
* G: 
. bl 
. ?! 
- 7: 
► 7: 
. Si 


Bi 


9: 

10: 

Si 


10  i 

IZ: 

II: 

11: 

E. 

7: 

. 11: 

• 9: 

• Hi 
► 1: 


LAST! 11  • 
LAST  121  - 
LmST13I  ► 
LAST  111  - 
LASTIS1  * 
LmSUBI  • 
LASTIZ1  • 
LmSTIBI  - 
1 AST  1 9 1 . 
LAST  1 101  * 
LAST  lilt  . 
LAST  I IZ I * 
LAST  1 1 3 1 • 

last i in  - 

LASTIISl  - 
LAST  I IB  I - 
LhSTIIZI  - 
LAST  1 IB  I • 
LAST  1 191 
LAST  I Z0 1 • 
LASTIZII  • 
LAST  I ZZ I - 
1A5TIZ31 
LA5TIZ1! 
LhSIIZSI 
LAST 1ZB1 
LAST  I Z? I 
LASH  ZB  I 
LhSTIZDI 
LAST  1 301 
LAST  1 3 1 1 
LAST  1 3d 


Zi 

EST  1HE1 1 1 * Z Si 

3i 

ESTinElZl  * I BI 

4: 

EST IHE13I  * 3 - 0 1 

10» 

eshhehi  * 1B.1I 

S> 

ESTINE1S1  . 1.21 

Gt 

EST1T1E1B1  > 3.  B i 

7» 

ESTIflEl  71  • G.?i 

Gt 

estiheibi  . Ill 

7t 

EST1HE191  > 1 .3' 

3: 


B: 

9: 

IV: 
11: 
III 
II . 

- IH 
lZi 
IZ) 

Hi 

- Ill 

- 13: 
. 13: 
. Hi 

13: 

Si 

- IZ: 

- IZi 
- 1 3 1 
► 101 


EST I HE  1101 
ESTIHEI1I1 
EST1I1E11Z1 
EST1T1EI131 
EST 1HE 1111 
Esnnnisi  * 
ESTlnEllGl  * 
EST  IMF.1 171  • 
EST  HIE  1 IB  1 • 
ESTIMEI 191 
ESTIT1E1Z01  • 
ESTMEIZI 1 ■ 
LSTlflE  IZZ1  i 
F.SI1DE1Z31 
ESTIT1EIZ11 
EST’TIEIZSl 

estideizbi 

EST?  IEIZ7I 
ESTltlElZBl  - 
ESTIT1EIZ9I 
ESTinC1301 
EST1HE131 I 
ESTINEI3Z1 


BE: 
Z.Zi 
1.9: 
3.ZI 
■ I. II 
: G.Ol 

- G.Ol 

- B.II 

► 0.  7 1 

.10i 

. 0.7l 
. E li 
. 3.Bi 
. O.Zi 
. Z.Si 
. 0 . 9 1 
11.11 
6.01 
. 7.31 
. 3 . B i 

► 0.71 

► IZ-Bl 


TSTiTPT  . 0.01 


PERT  iNhCTS.INll  .LAST  .EST1T1E  . TSTaPT  . NEVNTS. L INK. EARL YS.LATEF)  i 
PUTDUTINELNTS.LIN)  .EARLYS.LATEFI  I 


ENO  i 


WDPL l 3Z I : 
WDR):iZ7)i 


ENO  i 


MKliiiHliHiiiM 


Listing  of  tho  short  subject  *l90rilh*s 
HAAV1E 


BEGIN 

COMMENT  THIS  IS  CALGO  ALGOP I THfl  NO.  2S7,  HAAVIE  f dEGRATlON. 
ALGORITHM  BT  POBEPT  N.  KUB1F.  PUBLISHED  CACH  IKS, 

TYPED  BY  ft.  LUNOE.  C-MU  1972  I 

REAL  A.B . EPS. MASK .Y , ANSWER i 

REAL  PROCEDURE  HftUIEl ft >B. EPS.GRHND.fi) I 
VALUE  ft. B. EPS, Ml 
INTEGEP  fl; 

REAL  A.E.EPSi 
REAL  PROCEDURE  GPrtNDi 

BEGIN 

REAL  H.rilDPTS.SUIIT.SUflU.O.Xi 
INTEGER  I.J.K.Ni 

APPftY  TIIiIZI.UIIi 121  .TPREVI 1 1 12I.UPREV1 li 12 1 j 

ENDPTS  * GPANDI A) I 

ENOPTS  » 0S*lGRANOiBl»£NOPTS)i 

SUNT  * 0.0i 

I » N - I< 

H . B-fli 
ESTIMATE i 

Till  • H»IENDPTS*SUMT)I 
SUMU  » 0.01 

X - A-H/2.01 

FOP  J * 1 STEP  1 UNI IL  N OC 
BEGIN 
X . X*Hi 

SUMU  > SUMU«GRftNOIX)j 
ENOl 

UIII  * H*SUfHJi 
X . ll 
TEST  i 

IF  ABS(TIKI-UIKl)  <•  EPS  THEN 
BEGIN 

HAV1E  > 0,  S*U1KI*U1K  I)  l 
GO  TD  EXIT) 

ENOi 

IF  K • 1 THEN 
BCG  IN 

0 • D t I2*K>| 

TI)‘II  . 1 0*T IK  1 -TPREVI K 11/10-1.0)1 

TPPEVIK*!!  * T IK  1 1 

UIK*1 I • I 0*UIK I-UPREVIK 11/10-1.0)1 

UPPEVIK!  * UIKli 

K * KMl 

IF  K - M THEN 

BEGIN 

HAVIE  * MASK) 

GO  TO  EXITi 
ENOi 

GO  TO  TEST  I 
ENOi 

H * H/2.01 
SUMT  4 SUNT  * SUMU  i 
TPPEVIK 1 4 T IK  1 1 
UPPEVIKI  4 UIKli 
I 4 I«  1 j 
N 4 Z»Ni 
GO  TO  ESTIMftTEi 
EXITi 

ENOi  ' ENO  OF  HAAVIE  INTEGRATOR  i 


PEAL  PPOCEOUPE  EXPZIXU 
VALUE  Xi 
REAL  Xi 

EXP?  * EXP(-X*X)i 

A 4 O.Oi 
B 4 l .Oi 
EPS  4 O.OOOOSi 
MASK  4 9.39, 

ANSUER  4 HAVIEIA.B. EPS. SORT .12)1 
UPITEf’ICIMi  PPINTiANSUER.t.lOli  WRITECICl'li 
EPS  4 0.900001 i 
A 4 0.01 

B 4 9 , 3 1 

ANSUER  4 HAVIEIA.B.EPSEXP2.12): 

HRITEI’ICI')!  PRINT! ANSVIER.4. 10)  i WPITET'IC)*): 
ENOi  ! ENO  OF  MAIN  PPOGPAM. 


awamosiits^asJssF'  irswmBWSSWgS' 
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151 NE. 


BEGIN 

COMMENT  THIS  IS  ALG0P1  THU  3SS  OF  THE  CACM  ALGORITHM  SECTION. 
PUBIISHED  IN  ChCM  1C. 10  IOCT  19B91  P.SBC. 

OUTER  BLOC!  WITH  I/O  UNO  OIHEP  SlftTEMENTS  INTPOOUCEO 
AND  VhIUE  PAP  IS  ANO  PEMhINOEP  OPEPATOR  ADDED  BY  A.  LUNOE. 
CHPNEGIE-MELLON  UNIVERSITY.  JULY  1972. i 

INTEGEP  APPAY  SEOUI 1 1 10‘)1 1 

INTEGER  Max  .ONES. SHIFTS.  I .UPPEP.MAXm.IMI  I 

PkOCEDlIrE  ISINGiN.X.T.SIi  VALUE  N.X.Tl 
INTEGER  N.X.Ti  INTEGEP  APPAY  Si 
BEGIN 
INTEGER  t.i 

INTEGER  APPAY  L.MIIiT  OIV  2»lli 

PROLTDUPr  S0PTiL.n.2)i  VALUE  2i 
INTEGEP  APPAY  L.Ml  INTEGER  2l 
BEGIN 

INTEGEP  P.I.J.ML.2B, 

FOR  ML  • I STEP  1 UNTIL  N 00  S1NLI  * 2i 
R > I 4 |,  ZB  ► l-2i 
Aft,  J • R*LI 1 1- 1 > 

FOR  ML  • P STEP  I UNTIL  J DO  SIMLI  ► ZBi 
IF  1*1  <=  K THEN 

BEGIN  R 4 J«MI 1 1 * 1 1 I • 1*1 1 GO  TO  AA  ENOi 
GO  TO  EXIT) 

HPITEI'ICDl 

FOP  ML  - 1 STEP  I UNTIL  N 00 
BEGIN 

IF  I ML  PEM  CH  4 O THEN  WRITE  I ’ IC  I — * ) I 
PPINTIS1ML). 2.0)1 
ENOi 

EXIT: 

ENO  SORT i 

PPOCEDUPE  BISORTIL.MII  INTEGER  ARRAY  L.Ml 
BEGIN 

SDPTIL.M.O)!  SORT IM.L . 1 ) 

END  BISOPT i 

PPOCEDUPE  COMPDSEiX.) .L.Pii  VALUE  X.Ki  INTEGER  X.Ki 
INTEGER  APRAY  Ll  PPDCE DUPE  Pi 
BEGIN 

INTEGEP  l.Ai 
.'XU  THEN  CD  TO  CC l 
li 1 1 4 x-t.*n 

FDP  I*C  STEP  I UNTIL  K 00  LI  1 1 4 II 

Pi 

IF  ) <4  I THEN  GO  TO  CC 1 

A 4 I , 

BB,  IF  LIAI  > I THEN 
BCGIN 

lift)  4 LIAI-Il  LIAM]  4 L I ft*  1 1 * 1 1 Pi 
IF  A » K-l  THEN  A . A* I i 
GO  10  BB 

END:  CDMMCNT  LIAI  > I LOOPl 

LIAI  • LIAMU  L I A* ) I 4 |,  ft’ 4 0-|| 

IF  A >=  I THEN  GO  TO  BB: 

CC, 

END  COMPOSE  1 

K 4 T OIV  CM: 

IF  IT  PEM  2i  • I THEN 
BEGIN 

PPOCEDUPE  Pli  BISnPTlL.Hl! 

PPOCEOUPE  PC : COMPOSE. IN- X ,K ,M,PI ) 1 

COMPOSE  1 X.II.L.  PC) 

ENO 

EL5E 

BEGIN 

PPOCEOUPE  P3:  SOPTIL  .M.f  li 
PROCELHIPE  Pti  COMPOSE iN-X.K-l.M.PBII 
PPOCEOUPE  PSi  SOPTiM.L.llI 
PROCEDURE  P61  COMPOSE  1 N-X.K .M.PS) 1 

COMPOSE  1 » .V .L .PH ) 1 
COMPOSE  IX. K-l .L.PGI 
END: 

ENO  ISINGi 


i 


WRITE* *ICirrP£  UPPER  BOUND  FOR  fWXICI«,)i 
REROfUPPERli  MRITE('ICP)  I 
FOR  MAX  * 3 STEP  1 UNTIL  UPPER  00 
BEGIN 

IMXNI  » fWX-Ii 

FOP  ONES  > I STEP  1 UNTIL  NAXHl  DO 
BEGIN 

IM  - 1MN(0NES.HHX-0NES)I 
FOR  SHIFTS  . 1 STEP  I UNTIL  INI  00 
ISINGINPX.  ONES,  SHIFTS.  SEOUI I 

ENOi 

ENOi 

ENO  tWlNPROGRMli 


i 

I 


r 


V 


I 
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BN5IC  VERSION  OF  PERT 


300  DIN  1 1 3001 . LI  300  > .Ml  300) 
*00  DIN  Ei300>.f OOOI.XOOO) 
*10  N1  - 3D 
*20  CDSU6  200 
*30  N)  ’ 2? 

**o  cosun  200 

*S0  STOP 

200  PEN  SUBROUTINE  WORK 


000 

IU)  * 

1 

900  1 

L(I>  « 

2 

1000  1 

E(I>  *■ 

2.5 

1100 

1(2 1 * 

1 

1200 

L(2)  « 

3 

1300 

E(2>  « 

1.8 

1 400 

1(3)  * 

1 

1500 

Li  3)  * 

1 

1 BOO 

El  3)  * 

3.0 

1 TOO 

KD  ■ 

1 

IBM 

lid  s 

10 

1900 

Ed)  • 

IB . 1 

2i»«0 

K5»  * 

7 

Z1»»0 

1(5)  * 

5 

zzoo 

E(S>  * 

1.2 

2300 

I >6)  * 

2 

2100 

L(6>  * 

6 

2500 

El 6)  * 

3.8 

2600 

117)  • 

•» 

*. 

2700 

L(  7)  • 

7 

2800 

E(7)  « 

6.7 

2900 

KB)  * 

3 

3000 

LIB)  - 

6 

3100 

E ( 8 ) * 

11 

3200 

K9)  * 

3 

3300 

LI 9)  * 

7 

3100 

E 1 9 ) « 

1.3 

3500 

K 10) 

* 1 

3600 

LUO) 

= 7 

3700 

El  10) 

« 0.2 

3800 

KID 

-•  6 

3900 

Lim 

= 5 

1000 

E ( 1 1 ) 

6.6 

1100 

1(121 

6 

1200 

1112) 

8 

1300 

El  12) 

2.2 

1100 

K 13) 

7 

1500 

Li  13) 

8 

1600 

E)  13) 

1.9 

1700 

KID 

7 

1800 

Ll  ID 

9 

1900 

E(ID 

3-2 

5000 

1(15) 

7 

5100 

LI  19) 

10 

5200 

EI15) 

1.1 

5300 

II  16) 

5 

5100 

LI  16) 

11 

5500 

E(  16) 

6.0 

5800 

III?) 

8 

5700 

Lt  17) 

11 

5800 

EH7) 

6 0 

5900 

1 1 18) 

9 

6000 

Ll  16) 

11 

6100 

E ( 10 ) 

8.1 

6200 

K 19) 

10 

6300 

1(19) 

11 

6100 

El  19) 

0.7 

6500 

1 ( 20 ) 

5 

6600 

L(20) 

12 

6700 

El  20) 

1.8 

6800 

I»21) 

8 

6900 

L(21) 

12 

7000 

E ( 2 1 ) 

0.7 

7100 

1(22) 

8 

7200 

L(22) 

11 

7300 

D22) 

6.1 

7100 

IC3) 

10 

7500 

Li  23) 

11 

7600 

EI23) 

3.8 

7700 

Id) 

12 

7800 

LCD 

13 

7900 

E(2D 

0.2 

8000 

1(25) 

11 

8100 

LI  25) 

13 

8200 

El  25) 

2.5 

8300 

1(26) 

11 

8100 

L(26) 

11 

8500 

EC6) 

0.9 

8600 

1127) 

6 

8700 

L(2?) 

13 

8800 

i EC?) 

11.1 

8900 

i 1CB) 

i * 7 

9000  LI2BI  * S 
9100  EI2BI  • 6.0 
9200  K29I  * II 
9300  LI29)  ■ 12 


Listing  of  th«  short  subjsct  algorithm 
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9100  El 29)  * 7.3 

19100  UNI)  ■ LIN3I 

9500  1(30)  * 9 

19400  L«N3)  ■ IB 

9600  LOO)  * 12 

19500  NEXT  N3 

9700  EOO)  • 3.8 

19GOO  01  * 0.0 

9000  1(31)  • 14 

197(H)  FOP  13  ■ 1 10  N2  STEP 

9900  LOl)  * 13 

190.'H  F ( 13)  * XI 13) 

10000  £(31 ) - 0.7 

193...)  m 13)  . RQ5(f1f  13) ) 

10100  11 32)  * 1 

20000  IF  01  )*  XI 13)  THEN  26 

10200  LI 32)  ■ 10 

20DSD  01  ■ Xi  13) 

10300  £(32)  • 12.6 

20100  NEXT  13 

10S00  Tl  * 0.0 

20200  GO  TO  20900 

1O7O0  PEfl  COLL  PERT(N1.11.L.£1.T1,N2.L2.£2.X1) 

201.10  PEfl  COSE  2 

10000  G0SU8  13300 

20SOO  FOP  13  « 1 TO  N2  STEP 

10805  REfl  COLL  PUTOUT 

20GPO  fll  13)  • 0BS(f1(13)l 

10010  GOSU0  10910 

206S0  NEXT  13 

10030  RETURN 

20700  PE TURN 

10900 

209OO  GO  TO  11SO0 

10910  REfl  SU8RQUT1NE  PUTOUT 
11000  PRINT  NZ."  EVENTS" 

21000  ENO 

14 


11100  RETUPN 

11200  FOR  12  • 1 TO  N2  STEP  1 

11300  IF  0.001  > O0S( (F(I2>-Xtl2>))  THEN  11700 

1 MOO  PP.NT  Wl2).Fll2).Xt!2l 

11600  GO  TO  1190O 

11700  PRINT  t1ll2).F(I2),X(I21,"  CRITICAL" 
11900  NEXT  12 
11990  RETURN 
11990  PEH 

11999  PEH 

12000  REfl  SCONI 12. 11. L2) 

12010  REP1 

12200  IF  12  * 1 THEN  1300P 
121O0  FOR  LI  * 12-1  TO  1 STEP  -1 
12600  IF  14  ■ tl(Xl)  THEN  12700 
12650  GO  TO  12900 
12700  14  * K1 
12000  PE TURN 
12900  NEXT  K1 
13000  fit  12  > 

13100  14  ■ 12 
13200  12  » 12*1 
13290  PE TURN 
13290  PEH 

13299  PEfl 

13300  PEH  PERTIN1.11.L.E1.T1.N2.L2.E2.X11 
13400  REfl 
13600  12  - 1 

13700  FOR  N3  ■ 1 TO  N1  STEP  1 
13700  11  • LIN3I 
13790  GOSU0  12OO0 
13000  PEH  COLL  SCONI  12.LIN3)  .L21 
13010  LIN3I  * 11 
13000  11  * UN3I 
13090  GO5U0  12000 
13900  REfl  COLL  SCONI  12 . 1 (N3I  .L21 
13910  1IN3I  » 11 
11000  NEXT  N3 
14100  N2  * 12-1 
14200  IS  ■ I 
14300  01  . Tl 

I4SO0  REfl  WHILE  TPUC  00 
11000  K2  » N2 

11900  FOR  13  * I TO  N2  STEP  1 
15000  XI 13)  * 01 
1S0S0  NEXT  13 

15200  PEM  00  ' 000*  > WHILE  K2  X 0 

1SS00  FOR  N3  ■ 1 TO  Nl  STEP  1 
1S600  IF  0 .•  f1(IIN3)l  THEN  16900 
15000  PEfl  COSE  15  OF 

16000  ON  IS  GO  TO  16200.16600 
16200  X2  . O0SIX(IIN3)))*E<N3) 

16300  IF  O0SIXILIN3)))  X*  X2  THEN  16900 
16350  X(LIN31)  ■ -X2 
16400  GO  TO  16900 
16600  X2  • O0Sf XI |IN3) I1-EIN3) 

16700  IF  X2  >•  O0SfX(LIN3)))  THEN  16900 
IS 750  X ( L 1 N3 I ) » -X2 
16900  NEXT  N3 

17100  FOP  13  * 1 TO  N2  STEP  1 
17200  IF  fit  13)  >■  0 THEN  17000 
17300  IF  XI 13)  >»  0 0 THEN  10300 
17400  mi3)  ■ O0simi3)i 
1 7500  K2  « 1.2*1 
17600  XI 13)  » 0051X1 13)) 

17700  GO  TO  103*0 
170O0  IF  0 0 > XI 13)  THEN  10200 
17900  fit  13)  * -fit  13 ) 

10O0O  X2  . F2-1 
10100  GO  TO  10300 
10200  XI 13)  . 0051X11311 
103OO  NEXT  13 

10400  IF  12  . 0 0 THEN  10700 
101SO  GO  TO  1S200 
10700  ON  IS  GO  TO  19000. 20500 
18900  PEfl  COSE  1 
19000  IS  * 2 

19100  FOR  N3  • 1 TONI  STEP  1 
19200  16  * UN3) 


— 


c.aasa\'.iisV^-aatMA>.;. 


■ - • 


! 


Listing  of  the  short  subject  algorithms 


BLISS  VERSION  Of  PERI 


MODULE  BLlllSISTACKUOOOTI  * 

BEGIN 

HBCPO  AOS'XI  ■ IIP  IX)  GEO  0.8  THEN  IXi  ELSE  FNEG  IXIlli 
MACRO  1A0SIXT  ■ (IF  (X)  GEO  0 THEN  IXI  ELSE  -IXII»i 

EXTEPNAL  OUTMSG.OECOUT.Fi.OUTi 
FOPWAPO  PUTOUTi 

OMN  NEVNTSi 

OWN  1 N 1 T 1 300 1 . LftST  1 300 1 . L 1 N».  1 300 1 1 
OWN  ESI IHE13O0I , EARLYSI 3001  .LATEF 13001 1 

STRUCTURE  VECT1111  * ( VECTH.  1-1  K0.3S:  I 
MAP  VECTI  INI T iLBST i LINK; ESI IME iEARLYSi LATEF j 

FUNCTION  PERKNNBX.18EG.JEN0. IE. ST. EM8X.LNf.ES. Mil  ■ 

BEGIN 

STPUCTUPE  PBPUEC1I1  * (a.PAPUEC* • 1-1  KB.36'1 
MAP  PBPVEC  IBEGi  JENOiLNKi TEiESiATT  I 

L0C8L  1EX.1SX.1TX.KXI 
LOCAL  BXX.XXXI 

FUNCTION  SCAN(TOBJ)  * 

BEGIN 

IF  . TEX  NEO  1 THEN 
BEGIN 

OECR  KX  FPOM  . 1EX-1  TO  1 BY  I DO 
IF  a.TOBJ  EOL  .INF  1 .KX)  THEN 
BEGIN  I.T0BJT<0.36>  * .KXl  RETURN  END 

ENOl 

LNKI.IEX)  • a.TOBJi  I . TOOJ l<0.36>  » . IEXi  1EX  . .1EX*|| 
ENOi  '.SCAN  i 

TEX  - li 

1NCP  NX  FPOM  1 TO  . NMAX  BY  1 00 
BEGIN  SCBNIJENOI  .NXXO.OYK  5CANI1BEGI  .NXhO.OM  i ENOi 
<.EMAXI<0,36>  ' .lEX-ll  1SX  • Ol  AXX  • .ST » 

WHILE  1 00  ! WHILE  TRUE  DO 

( KX  * a.EMAXi 

1NCR  1EXZ  FPOM  1 TO  f.EMHX  BY  1 00  ATT1.1EX21  ♦ .HXXl 

00  t DO  <BOOY>  WHILE  KX  NED  0. 

( I NCR  NX  FPOM  1 TO  • NHHX  BY  1 00 
BEGIN 

IF  .LNK1. IBEGI. NX11  GTR  0 THEN 
BEGIN 

CASE  . 1SX  OF 
SET 

! CASE  1 
BEGIN 

XXX  . AOSI  .Bill.  IBEGI. NX11!  FHOR  .TEI.NXH 
IF  .XXX  GTP  ABS f . AT T C . JENO I • N < 1 1 T 
THEN  ATTI  .JEN0I.NX11  « FNEG1.XXXU 

ENOi 

1 CASE  2 
BEGIN 

XXX  - ABSf.HlTl. IBEGI. Nxlli  F5DP  .TEI.NXli 
IF  .xxx  LSS  AOSI. ATTI. JENOT.Ntl I) 

THEN  AT1I.JEN0I.NXII  • FNEGl.XXXII 

ENOi 

TESi 

ENOi 

ENOi 

I NCR  1 EX?  FROM  1 TO  a.EMAX  BY  1 00 
BEGIN 

IF  .LNM.1EX2I  LSS  0 THEN 
BEGIN 

IF  .ATTI.IEX2I  LSS  0 THEN 
BEGIN 

LNF1.IEXZI  * TABSi.LM  I.1EX2IK  KX  . .FXMi 
ATTI.IEX21  * AOSI. ATTI.  1EX21K 
ENOi 

ENO  ELSE 

IF  .ATTI.1EX2I  GEO  0 THEN 

BEGIN  LNM  1EX2I  - -.LNI.I.1EX2H  KX  . .KX-li  END 
ELSE  ATTI . 1EX21  - ABS( .ATTI . 1EX2I I i 
ENO  i 

I WHILE  .KX  NEO  Ol 

CASE  1SX  OF 
SET 


ENOi 

AXX  * Ol 

1 NCR  1EX2  FROM  1 TO  a.EMAX  BY  1 00 
BEGIN 

ESI.1EXZI  • . ATTI . 1EX2)  i 
LNKI.1EX2I  * IAOSI.LNKI . TEX21 K 
IF  .ATTI . TEXZ)  GTP  AXX  THEN  AXX  » .ATTl.lEXZli 
ENOi 

END:  I OF  CASE  1 


! CASE  2 
BEGIN 

1NCR  IEX2  FROM  1 TO  a.EMAX  BY  1 DO 
LNlT.lEXZl  - 1A85I.LNM  1EX21H 
RETURN 

ENOi  I OF  CASE  2 
TESi 

I ! ENO  OF  WHILE  TPUE  DO  LOOP. 

ENOi  ! PERT i 


a 

u 

LAST  II  1 

a. 

■» . 

ESTIMEI  11 

a 

2. Si 

a 

)l 

LASTI2I 

a 

3! 

EST 1 ME 121 

a 

l.Bi 

* 

li 

LAST  131 

a. 

il 

LST1MEI 31 

a 

3.0i 

i» 

LAST  1 1 1 

a 

10; 

ESTIMEItl 

IB.  1 1 

► 

Cl 

LASTISI 

a 

Si 

ESTIMEISI 

t.Zi 

a 

2: 

LAS1IBI 

a 

G; 

ESTIMEIB1 

B.Bi 

* 

Zi 

LAST  171 

?s 

ESTIMEI?) 

B . 7 1 

3 » 

LASTIB) 

a 

Gi 

EST1MEIB1 

Mi 

3 

LASTISI 

a 

7; 

ESTIMEISI 

1 . 3i 

FUNCTION  W0PFINAC151  » 
BEGIN 

LOCAL  TSTARli 


IN1 1 1 1 1 

INITIO 

INI T 1 31 

INIlltl 

1N1TISI 

INITIB) 

INIT l 21 

INtTLBI 

INITI9I 

INI  II 101 

INITIIII 

INI T I 12) 

INI T 1 13 1 

INITIMI 

INI  I i IS  I 

1NIT1 1 IS  1 

IN1TII71 

1N1TIIBI 

INI Tl 131 

INI  1 1201 

IN1 1 1 21 1 

INIT122! 

INI T l 231 

IN1TI2K 

IN1TI2S1 

INITI2B) 

INIT122I 

INITI2BI 

INITI291 

TNI T 1 301 

INI T I 31  I 

1NIII32I 


► ti 

- G. 

• Si 
» ?i 
. ?i 

► 7i 

• Si 

- Bi 

• 9i 

► lOl 
‘ Si 

. Bi 

- B: 

• lOl 
» I2i 

► 111 
► 111 

• BI 
► ?l 

• III 
. 91 

• Iti 
» ti 


LAST  I IOI 
LAST  1 1 1 1 > 
l AST  1 121  * 
LAST  1(31  ► 
LAST l I t 1 - 
LMST  MSI  <• 
LAST l IB  I ► 
LASH  171  ► 
LAST  I 10  ) 
LASTII9I 
LAST  1 20 1 * 
LASI12II  ► 
LAST122I  * 
LAST  1231  - 
LASTIOtl 
l AST  1 2S I 
LAST  1 2B I 
LA5II27I  • 
LAST  1 2B I • 
LASII29I 
LAST130I  • 
LHS! I 31 1 
LAST  132)  • 


?i 

Si 

Bi 

Bi 

9i 

101 

III 

III 

III 

> Ill 
1 2 1 
12* 
It! 
Itl 

• 13: 

> 1 3 1 

► lti 

1 3 1 
Si 

► 12 1 
12 1 

► 13i 
lOi 


ESTIMEI 101  * 0 . 2 i 
EST1ME1 ) 1 1 ♦ B.Bi 
ESTIMEI 121  * 2.2: 

E S 1 1 ME  1131  * t.9i 
ESTIMEI It  1 » 3. 2 1 
LSI 1MEI IS1  * 1 . 1 i 
ESTIMEI IB)  » B.Oi 
ESTIMEI 171  » B.Bi 
ESTIMEI IB)  - B. 1 i 
ESTIMEI 191  * 0.7i 
E5T1MEI201  » t.Bi 
ESI  IKE  121 1 * 0.7| 
EST 1 ME  1 22 1 » B.ti 
EST I ME  1231  * 3.Bi 
EST I ME  1 2t 1 - 0. 2 1 
EST 1 ME  1 25 1 ► 2. Si 
ESTIME1261  * 0.9i 
EST1MEI271  » II. li 
E5T1MEI2B1  » B.Oi 
ESTIMEI 291  ► 7. 3i 
EST1MEI30I  - 3.Bi 
E5T1MEI311  ► B.7| 
EST 1 ME  1 32 1 » 12. Bi 


T51APT  • 0.01 

PEPII  -NiCIS. IN1K0. 0>.LAST<0.0>. EST  1MET0.0X..TSTART. 

NEVNIS-  A. O', 1 1NFT0.O  ' •EAPLYS'.O.Ox  .LATEF <0.0X1 1 
PUTOUT I ,NEON1S.LINI  <0.0  ■ .EAPLYS'O. Ox. LATEF <0.0»  I 


ENOi 


P0U1INE  WORT 


POUT  INF  PUTOUT  I NEO. LI’  .EAS.XLF  I * 

BEGIN 

STRUCTURE  PAPULCI  I 1 * U.PHPVEC*.  1-1  K0.36>l 
MaP  PaPUEC  LKiEASiXLT i 


OUTMSGlOPl  IT  ":’M"’J  * 1 1 OCCOUTIO.t.  .NEVll 
0UTH5C.t0.PLIl  1 EUr.NT5”?MT*J ' 1 1 
PE  TUPS i 

1NCP  IX  FROM  I TO  .NEV  BY  ) Du 
BEGIN 

OUTMSG'O.PLIT  • •PM-’J  * ) : OECOUMO.Y..LKI.  1X1)1 

FLOUT  'A,  .EASl . IX  I.IO.t  1 .*  FLOUT  10. . Xl.FI . IX  I . lO.t  1 1 
ir  AB5H.rMSl.lXI  F5BP  XIF1.IXIII  L55  0.001 
1HEN  OUTMSG'O.PLIT  1 CPITICAL'K 

ENOi 

OUTMSG'O.PLIT  ’ 7M’’J ' 1 1 
ENOi  ' PDUTINE  PUTOUT 


WORM  32 1 1 
WORM  271 1 


ENO 

ELUOOM 


E-B 


| 


1 CASE  1 

BEGIN 
ISX  - 1 i 

1F.CP  NX  FROM  I TO  . NMAX  BY  1 DO 
BCG  IN 

1TX  - . IBEGI. NXH  IBEGI. NXI  ► .JENOl.NXli 
JEN0I.NX1  * • ITXi 


Lifting  of  th«  short  subject  algorithm* 


11 


FORTRAN  VEPS10N  Of  PEPT 


CALL  WOPt  < 321 
CmLL  WOP*.  I C7 1 
ENO 

5U9P0UTINE  WOPFINaCTSI 
1N1TIALI2E  DATA  ANO  CALL  THE  PROPER  STUFF. 
OlflENSION  1NITI3A01  .LAST  13001  .LINKI3O01 
OlflENSION  ESTINE(300).EAPLYSI3OO).XLAlEFI3O0i 

1N1T ( 1 ) • I 
LAST I 1 1 * 2 
ESTIMEI I > * 2.S 
IN  IT  I 2 * * 1 
LAST (2)  * 3 
EST1ME 1 2 • * 1.9 
IN  1 T l 3 I • 1 
LAST  13*  « 4 
EST1MEI3I  * 3.0 
INI  TIT  I = I 
LAST  I 4 I ■ 10 
EST1MEI4I  - 19.4 
1N1TISI  * 2 
LAST  IS)  « S 
ESTIflEISI  * 4.2 
IN  1 Ti 6 1 « 2 
L AS  TIG)  « 6 
ESTIMEI6)  * 3.9 
1NITI7)  * 2 
LAST  I 7)  * 7 
ESIIMEO  = 6.7 
IN  I T I 9 ) > 3 
LASTigt  * G 
ESTIHEOI  ■ 1.1 
IN1TI9)  ■ 3 
LAST  f 9 ) » 7 
EST 1 ME  1 9 ) • 1.3 
INI Ti 101  - T 
LASTIIO)  * 7 
EST 1MEI 101  ■ 0.2 
IN1TIIII  • G 
last  (in  - s 
ESIIMEIllI  - G.G 
INI T 1 12 1 - 6 
LAST (121  * 9 
EST1MEU2I  - 2 2 
1NITU31  . 7 
LASH  131  * 9 
ISI1HEH3I  * 4.9 
INI TI IT  > . 7 
LAST  I IT  I ■ 9 
ESTIHEIITI  > 3.2 


INITHSI  ■ 

7 

LhSTHS)  ■ 

10 

ESTIMEI 15) 

* 1.1 

INITHGI  * 

S 

LAST ( 1G)  * 

11 

ESTIMEI IG1 

= 60 

INITi 17)  • 

0 

LAST ' 1 7)  * 

11 

EST  1 ME  < I ? 1 

* 6.0 

INITI 19)  " 

9 

LASH  19)  - 

11 

EST I ME ( 19) 

* 8.1 

INITi 191  . 

10 

LAST  1191  * 

11 

ESTIMEI 19) 

* 0.7 

IN  IT i 20)  = 

S 

LAST ' 201  = 

12 

ESTIMEI 201 

* 1.0 

INITI21 I • 

0 

LASTI2I I * 

12 

ESTIMEI 21 J 

* 0.7 

INITI22I  * 

0 

LAST  1 22 1 * 

11 

EST I ME  1 22  > 

= 6.1 

INITi 23 1 * 

10 

LAST  1 231  - 

H 

ESTIMEI 23) 

• 3.0 

INIT124  1 * 

12 

LAST  1241  * 

13 

ESTIMEI24) 

* 0.2 

INITI7SI  * 

11 

LASTI2S)  - 

13 

ESTIMEI2S) 

- 2.S 

INITI TG 1 ■ 

11 

LAST  1261  . 

11 

ESTIMEI26) 

* 0.9 

INIH27)  • 

6 

LASH27)  ■ 

13 

EST1NCI2?) 

: 11.1 

INITI29I  « 

7 

LAST  1291  « 

5 

EST I ME  1 29 1 

*6.0 

INI!'  ’'91  ■ 

11 

LAST  231  • 

12 

ESTIMEI 29 1 

« 7.3 

INI TI 30 > . 9 
lnSTI30i  • 12 
EST IKE  1 30 1 * 3 9 
INI  11311  > IT 
LhST 1311  - 13 
estimei?, ii  ■(.? 

1N1TI 321  * T 
LAST  1 3: I • 10 
ESI  IKE  1 32*  * 12  6 

T5TAPT  = 0.0 

CALL  PERT  INhCTS.INIT. LAST  .ESTIhE.TSTART . 
I NEWNTS.LINI  .EAPLYS.XLATEF) 

CALL  PUTOUT INEWMS.L  INI  .EAPLYS.XLATEF  I 
RL  TURN 


SLWPOUT  INC  PU I OUT  I NEW . LI’ . EAS . XLF I 
DIMLNSION  LHH.EASIII.XLFMI 

TYPE  1000. NEW 

I O0U  FORMAT  IIX.1T.7H  EWENTSI 
RETURN 

on  1 IX  = 1 .NEW. I 

IF  IABSI  (EAS*  IXI-XLFl  1X1)1  .LT.  0 001 1 GO  TO  2 
TYPE  1001  LI. I 1XI.EASI  IXI.XLFliX) 

100!  FOPflHT  I1X.IT.2FIT.TI 
GO  TO  I 

2 TYPE  1002.1.KI  IX I .EAS'  IX1.XLF'  IX  I 
1002  FOPMAT  (IX. IT. 2FIT.T.9H  CRITICAL) 

I CONTINUE 

RETUPN 
END 

SUOPOUTINE  SCANiIEX.I109J.LNI.) 

OlflENSION  l.NI  1 1 1 
IF  HEX  .EU.  II  GO  TO  1 
LUCY  = 1EX-I 
DO  2 L>2  = I. LUCY. I 
1.x  = LUCY-I  X2  * I 

IF  HTUBJ  .NE.  LNI.IF/I)  GO  TO  2 
1T00J  - LX 
RE1UPN 
2 CONT INUE 
I LNI  IIEXI  • 1T09J 
IT  OBJ  * IEX 
IE X = IEX  I 
ENO 

SUDPOUT  INE  PEPTl  NflAX . IBEG.JENO.TE  ST  .MAXE  .LNI’  .E5.  AT  I 
OlflENSION  IBtGH 1.JENOI  I I.LNI'1 1 1 .TEIl  1.ESII  I .ATI  1 1 

IEX  « I 

OD  1 NX  = I .NflAX. I 

ChLL  SCANHEX. JENOINX). I Ni  l 
CALL  SCAN!  IEX.  IBEGiNX), LNI  ) 

1 CONI INUE 
MAXE  * IEX- 1 
IS*  = I 

AXX  * ST 

C WHILE  1PUE  00 

2 CONTINUE 

EX  . MAXE 

00  3 1EX7  ■ 1 .M"XE . 1 

3 Ml  I JEX2)  * AXX 

C 00  <000Y>  WHILE  IX  NE  0 

6 CONTINUE 

00  T NX  * I . NflAX , I 

IF  ( LNI- C IBEGINX)  I LE.  0)  GO  TO  T 

C CASE  15X  OF 

GO  TO  I10I.1O21.ISX 

101  XXX  * ABS'ATHBEG'NXIIHTEINX) 

IF  I XXX  .01  AOS' AT  I JENOINX)  I ) I ATIJLNOINXH  • -XXX 
GO  TO  T 

102  XXX  = ABSIATUHEGINXIII-TEINXI 

IF  (XXX  .LT.  AOSIATIJENOlNxm)  ATI  JENOINX ) > ■ -XXX 


00  7 IEX2  « I. MAXE.  I 

IT  i LNI  I ILX2I  GE  0)  GO  TO  9 
IF  (ATI  1EX21  GE  P oi  GO  TO  7 
LNI  I IE X 2 1 « IhBS'LNi  (IEX2H 
I X - I X*1 

ATI  IEX, ”1  « AOS'AT l IEX21 1 
GO  10  7 

IF  (aTi  1EX2I  LI . 0 01  GO  TO  9 
LNLI1EX2I  . -LNUIEX2) 

IX  * I.X- 1 
GO  TO  7 

All  1EX2I  « AOSiATHEx;n 
CONTINUE 


*infriiiniM.i.i  in  


Li  ft  ms  of  the  ihori  subject  (Ison  that 


291 


12 


11 


202 

10 


IF  IKK  .NE.  01  GO  10  £ 


END  OF  DO  <BOOY>  WHILE  KX  • 0. 


GO  >0  I201.2O2I.ISX 


COSE  I 
15X  . 2 

00  12  NX  • 1 .NMAX.I 
I T X * 1BEGINX  I 
IBEGTNX)  ■ JENDINXI 
JENO'NX)  « 11X 
CONTINUE 
axx  • e.e 

DO  II  IEX2  • l.MAXE.l 
ESUEX2I  ■ A1UEX2I 
LNMIEX2I  • IABSILNH  IEX2I I 
IF  CAK1EX2)  .Gl.  AXXI  AXX  • AHIEX2I 
CONTINUE 
GO  TO  200 


200 


CASE  2 

00  10  1EX2  • l.MAXE.l 
LNTIIEX2)  • liiBSiLNK<IEX2i> 
PE  TURN 
CONT INUC 
GO  TO  2 

ENO  OF  WHILE  TRUE  DO  LOOP. 
ENO 


iHiHiaijia 


~ -N SN*-  nrTjf  • - 1 1* > - J ,/J 1 v>IS ■. v “ • 1 ■ ’• 


E-10 


THE  5 VERSIONS  OT  AITIEN.  ALL  IN  ONE  PPOGPAtl. 
VERSION  SELECTEO  BT  CASE  INOEX 


MOOULC  INTEPPOLISTACI-  •!  IMEP=EXTEPNaL'S1X12I  1 
BEGIN 


GLOBAL 
MiiXC. 
H511P. 
IPACECASE i 


! VAPIAOLES  IN1TIALI2E0  BY  DOT 
! UPPEP  L 1 T1 1 1 FOP  LOOP 
! STEP  LENGTH  OUPING  INTERPOLATION  LOOP 
' SELECTS  POUT INE  TO  BE  1PACE0 


BIND 

NHhX  « 10. 

thgsie  * roil 


* MAX | HAL  NUhBEP  OF  POINTS. 
! SUE  Or  FUNCTION  TABLE. 


OWN 

x. 

HBSCISITAOSIZl. 

OPOINIThDSUIi 


'ABSCISSAE  OF  FUNCTION  TABLE 
iruNcuoN  values. 


extepnhl 

LOG' 


o 


VERSION  A 

<* 


ROUTINE  AAlllENiXI.YT.XX.N.L)  * 
BEGIN 

PEGISIER  HI.  LO.  li 


OWN 

xlNMml,  I ABSCISSAE 

DXINMAXI.  ' ABSCISSAE  Dirt  EPENCE . 

ylNn.iV  I . ' 010  FUNC  T ION  VALIICS. 

ZINMiTXIl  ! NEW  FUNCTION  VALUES. 


STPUCIUPC  PllPVECI  1 1 
MAP  PllPUEC  XT  i YT  l 


(i-PARVEC*  . I 1<9.36>I 


IF  .XX  C'iL  .XTI.LI  THEN  PEIUPN  . YT  I .LI  I 

' pre.phpe  and  PERroPM  binapy  seapch  for  right  interval. 

LO  • 01  HI  * .Li  I » ■ L/C : 


WHILE  ' .Hi-.LOl  GIP  I 00 
( ! LOOP  INVAPIANI  ISi 
I .XTI.L01  LEO  -XX  LSS  - XT  I HI  1 


IF  .XX  EQL  .XTI.LOI  THEN  PEIUPN  .VIl.LOIi 
IF  .XX  LSS  .XII. II  THEN  HI  - .1  ELSE  LO  » .ll 
I * I.HIi.LOI/2 


NOW  .XTI.LOI  LEO  .XX  LSS  XII  L0M1 


IF  * LO  » -LO-.N/?*l I LSS  0 THCN  LO  • Ol 
IF  .LO  • .N  - I GIP  .L  THEN  LO  » .L-.N*ll 


I NUU  PEADY  TO  INTERPOLATE . 

' USING  POINTS  .LO.  .10*1 LO* .N-l . 

I riPSI  INIIIALI2E  LOCAL  TABLE. 


LO  - .lO-H 

INCP  J FROM  0 TO  .N- I 00 

I XI. J1  * . XT ILO  * . LO* 1 1 1 
Yl  J 1 - .YTI.LOli 

0X1. J1  . .XTI.LOI  FSBF  . YX i 
! OUT  INI  l . Jli  I OUT  INI  1 .10’ : ' JUTFL9T  .XI . Jit  i 
I OUTFL  St . Yl . J 1 1 > lOI'I  ' Bl  0X1  .Jill  'CPLFIIi 

II 


' NOW  COMPUTE  SUCCESSIVE  APPROXIMATIONS 
I USING  SUCCESSIVELY  MOPE  POINTS 


incp  j rpon  o to  .n  2 do 

I INCP  I f POM  ..1*1  TO  . N- 1 00 

I 21.11  • (I.YI.JI  FMPP  .0X1. I'll  FSBP  I.YI.KI  FMPR  .0X1. Jll) 
FDVR  I.X1.E1  FSBP  .XI.JlIi 
'0UIFL9l.2l.Hli  T CPI  Fill 

li 

INCP  t:  FPOM  .J*l  TO  .N-I  00  YI.Kl  ••  .21.1.1 
li 


I NOW  PEAOY  TO  DELIVER  VALUE: 


■2I.N-1I 

ENO i ! POUT  INE  AAIHEN. 


*rrn**r 


Lifting  of  th«  short  subnet  alsorilhxif 


POUTINE  INOEXtXTPta.L.N.xi  . 

BEGIN 

\> 

flNO  THE  INOEX  Of  THE  ELEMENT  IN  XTAB  WHICH  IS  THE  FIRST 
Of  IHE  N ELEMENTS  CLOSEST  TO  X 

a 

STRUCTURE  IVECIII  * (». IVEC*. I K0.36>i 
MHP  IWEC  XTABi 

LOCAL  K.S.Ti 

t flNO  K S T.  .XTABI. M LEO  .X1ABI.F.MI 
INCP  I FROM  I TO  L 00 

t If  .X  EOL  .XTHBI.ll  THEN  <!.  » . 1:  EXITLOOPii 
IT  ,X  LEO  .X1ABI.1)  THEN  IK  - . I-I  i EXITLOORlr 
li 

I FlNO  START  UNO  FINISH  ELEMENTS  OISREGhROING  XThB  ARRAY  BOUNOS- 

S - .F.-.N/ZMI  t - l'*.N/2i 
IF  (.N  MOO  01  EOL  I THIN 

IF  I.X  FSBR  .XTABI. f II  LSS  I. XTAOI. FMI  FSBR  XI 
THEN  S * .5-1  ELSE  T - . T*  I s 

1 hOJUST  START  ELEMEN!  10  CONFORM  TO  APRAt  BOUNDS. 

If  .S  LSS  0 THEN  S * 0 ELSE  IF  .1  GTR  .L  THEN  S - S-.T*.Li 


VERSION  G 

a 


FUNCTION  GAIUENIXTAO.VTAO.X.N. LI  = 

BEGIN  LOCAL  XXi  YYI  till. LB  i BINONI-.N-l  i • 

1 XX  WILL  HOLO  XIH-X  FOR  the  data  points  chosen,  and 

i YY  THE  INTERPOLATED  VALUES. 

BIND  XTr.XlAB  . YT-.TTAB  I MAP  XT  ,YT  I 
LB- 1 LOCAL  I.L  i 
1*0  • 

WHILE  .X  GTR  . XT  I . II  HNO  .1  LSS  .L  00  I*.  Id  I 
i ) NOW  HOLOS  IHE  INDEX  OF  THE  F1PST  XI..  I 
! THAI  IS  GCQ  X. 

K-.I-.N/O  I 

ir  F LSS  0 THEN  0 

ELSE  IF  -K  GTR  .L-.Nd  THEN  .L-.N«1 
ELSE  .F  I i 

> 10  NOW  MDLOS  THE  INDEX  OF  OUR  SMALLEST  BASE  POINT. 
I INI  T (ML  I rE  XX  ANO  YY. 

INCP  I FPDM  U TO  N1  DO 

( xxl . II-.XII .LB*. I I FSBR  X ; 

YYI . I)-.  YTl.l  B*.  I 1 II 

• INTERPOLATION  EXAC1LY  ACCORDING  TO 
! SCHEME  OF  GIVEN  REFERENCE. 

I EhCII  1-ITERATION  GIVES  values  of  I-  h DEGREE. 

I NCR  I FPUM  I TO  NI  00 
IMhCRU  II=.I-I*  i 
INCP  J FROM  .1  TO  NI  00 

YYI.JI-  I . YYI 1 1 I FMPP  .XXI.JI 

FSBR  YYI.JI  FMPP  .XXIIII  I 
FOUR  (.XXI .J)  FSBR  .XXIII II  l! 

. YYINI 1 

ENOi  T GAI1IEN 


ENOi  T ROUTINE  INOEX. 

POUTINE  LAITIENIXTAB.YTAH.X.N.LI  « 

BEGIN 

! N POINT  INTERPOLATION. 

STRUCTURE  IVECIII  ■ (•. IVEC*. 1 K0.36>> 

STRUCTURE  MATRIX!  I .J)  * 1 1 I (. MATRIX*.  J»J*  I K0.3B>i 

MACRO  OEllA.B.C.OI  * IIS  FMPP  01  FSBR  IB  FMPP  Cllll 

OWN  MAIRIX  INI 1A. 101 l 
OWN  xcmn 
LOCAL  Jl 

MAP  IVEC  XTABiYTABl 

J - INOEXI.XTAB.  L.-N.-XII 

! INITIALIZE  XCI0..N-1!  10  .XTAOI . Ji  J*.N-I 1 
I NCR  I FROM  0 TO  .N-l  00  XC(  . I I • . XTAHI . I*.  J 1 * 

! INITIALIZE  INI01.N-1.AI  TO  YTABI.Ji.J».N-lt 
I NCR  I FROM  O TO  .N-l  00  INI  1 .01  • .YTABI  . I*.  Jli 

! GO 

INCP  J FROM  I TO  N-l  DO 
INCP  FT  FROM  1 TO  J 00 

INI. J.. Kl  * 0E1M  . INI  .I.-I..F-I  Il.(  XCI  K-II  FSBR  XI. 

I . INI . J . -K-I  II. ( .XCI ■ Jl  FSOP  .XII 
FOVR  ( XCI  Jl  FSBR  .XCI. L- 1 111 

RETURN  . INI . N-l . • N-l I 
ENOi  ! ROUTINE  LAITFEN. 


RtlllTINL  B*il II-ENI XThB. YTAB . X .N.L I * 

OEGIN 

STPUCTUPI  IVECIII  - H.IVCC*.  IKO.BGn 
MAP  ILtC  XIADiYIAO; 

DUN  UECIDP  Cl  101 .XX| 101 » 

PEGISTER  B.Ei 

B - xIABIPli  E - XTAOI -L- 1 1 1 

WIIIIE  i .E  Bi  GTR  I DO  _ 

IF  »II  B*  El/Cl  GTR  .X  Tllf N f - I .B*.E  1/0  ELSE  B * I.B*  EI/Zi 

IF  ID  • 0-.N/2*I>  LSS  XTABIO)  .HE N B * XTAOIOI 

CLSC  IF  . 0 GTR  XTABI  .L-  • N*  I I IHCN  B - X1ABI  .L-.N*I  1 1 
E • YTlllil  .B-XTllBI  01)1 

DCCR  I FROM  .N  I TO  0 00 

I XX|.I|  - «.B>  Cl. 1 1 - t-Ei  B - - B ♦ 1 1 E - • E ♦ I > i 

DtCR  I FROM  .N-l  TD  1 Dn 
OECR  J FROM  .1-1  TO  0 DO 
CI.JJ  • II, Cl. II  FMPR  I. XXI.JI  FSBR  XII  FSBR 
I .Cl . Jl  FMPP  I. XX|. II  FSBR  XIII  FOVR 
(.XXI.JI  F50R  .XXI. nil 


Cl  A I 

ENO;  ! POUTINE  BAIUEN 


List  ms  of  ths  short  subject  slsorithos 


E-12 


VERSION  C 

a 

ROUTINE  EAITKEN(XTA0.YTAB.XP.N.U  • 

BEGIN 

OWN  VECTOR  CI10I.XXI1O1.XXXI1O1: 

BEGIN  ! THIS  BLOCK  SAVES  ONE  1NSTR.  IN  THE  ENTRY  CODE  ANO  ONE 
! IN  THE  EXIT  CODE  SINCE  WE  NOW  ONLY  USE  1 REGISTERS. 
REGISTER  B.E.Xi 

B - . XTABi  E * .XTAB*2*.L-  Ni  X - . XP. 

WHILE  .E  GTR  I.BM)  00 

IF  tK.B*  El/21  GTR  .X  THEN  E - (.flr.EI/2  ELSE  B - I.B*.El/2i 

IF  (B  . B-.N/2M  I LSS  . XTAB  THEN  B • .XTAB: 

E * .YTA0r.B-.XTA0: 

DECR  I FROM  .N-I  TO  0 DO 

( XXXI. II  (XXI. II  » *.01  FSBP  .Xi  Cl  11  * O.E i 
B » B‘11  E .E*l 

END  i (OF  THE  BLOCK  THAT  SAVES  US  ENTRY /EXIT  CODE. 

OECR  I FROM  ,N-I  TO  1 00 
DECR  J FROM  .1-1  TO  0 0.1 

CI.JI  * ((.Cl. II  FMPP  .XXXI.JII  FSBP 
(.CI.JI  FMPP  XXXI. I 111  FOUR 
I.XXI.JI  FSk'R  .XXI. Ill: 


.ciei 

ENDi  I ROUTINE  EAITKEN 


ROUTINE  TEST(IRO.HO)  ’ 
BEGIN 

LOCAL 

J. 

H.  HMAX.  HMIN. 

X . 

Y. 

OY. 

FACT  I 


H . .HO I FACT  - I.OSl  X * I.Ai 

HMAX  - .HO  FMPP  3.01  HMIN  - HO  TTW  0.2l 

I NCR  1 FROM  0 TO  TABS12-1  BY  I 00 
( ABSCISI.il  r . X < 

IF  .1  GTR  0 THEN  ( IF  .ABSCISI  • 1 1 LEO  . fiBSCISI  .1-11  THEN  U 
OPDINI . 1 1 r LOG! .Xll 
X «•  .X  FAOP  .Hi  H » .H  FMPP  . FPCT  l 
IF  .H  GTR  .HMAX  THEN  (X  r .X  FhOP  H FDVP  3.0l 
FACT  - 0.9S1 l 
IF  .H  LSS  .HMIN  THEN  FACT  - 1.0S; 

II 

I NCR  COUNT  FPOM  I TO  .MAXC  00 
( X ► 1.01  H - .HSTEPl 
WHILE  .X  LEO  • ABSCISI TABS  12-1 I 00 
( 1NCP  I FROM  2 TO  NMAX  00 

( .IPOI(A0SCIS':0.O>.OPO1N<0.0Y..X..I.TABSIZ-Ili 
X .X  FADR  .H 

1 1 

>i  ‘ END  OF  TIMING  LOOP 
ENOi  ) OF  ROUTINE  TEST. 

CASE  .TRACECASE  OF 
SET 

xox  test(aaitkenc0.0>.0  in 

XIX  TEST(LA1T1EN'O.O>.0.Pi 
X2X  TESTIGAITEEN'O.O'.O.IIi 
X3X  TEST(BAIT1.EN<0.0>.01Ii 
XtX  TEST(EAITKENlO.0>. 0.1)1 
TESi 


END 
EL DOOM 


